Chapter 94. Expected Risk Limitations
Blake Bisset
I’ve generally employed two major risk analysis methodologies.
The first is architectural analysis. This is typically looking at perceived or unrealized risk through some flavor of FMA (failure mode analysis): FMEA (failure mode and effects analysis), FMECA (failure mode, effects, and criticality analysis), or even just a basic folks-sitting-at-a-whiteboard session looking for common anti-patterns in the design of the system and jotting them down, like lack of circuit breaking, throttling, exponential backoff and retry, jitter—that kind of stuff. These can be purely qualitative and subjective and still have value, but they also rely heavily on what you already know about your system. Or, rather, on the mental map of your system and what you think you know about it.
The second is data-driven analysis, when we add historical reliability data from realized risk to the FMA process (failure modes, effects, and diagnostic analysis [FMEDA]) or build heat maps of contributing factors for outages across a number of dimensions—such as type of failure, services involved, and geography—and associate them with user impact based on degradation, number of affected users, and duration.
The goal here is to arrive at an annualized (or other periodic) expectation of the realized impact of a particular risk. These are short essays, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access