Chapter 9. Using Risk Management When Architecting for Scale

Risk management involves determining where the risk is within your system, determining which risks must be removed and which can remain, and then mitigating the remaining risks to reduce their likelihood and severity.

When a risk triggers (or occurs), you or your system suffer a loss. This loss can be data lost by your company or a customer. It can be a lack of availability in your application by your customers. The loss can be invalid or erroneous results. Ultimately, any of these can result in your customers losing trust in your ability to manage their data and their business. This, ultimately, will cost you money.

However, you must weigh this loss against a competing aspect: what is the cost of removing the risk to prevent it from happening?

Ultimately, risk management is balancing the cost of removing a risk with the cost of having the risk occur.

Identify Risk

Your first step in managing risk is creating a list of all known risks, along with their severity and their likelihood of occurring.

We call this list a risk matrix, an example of which is shown later in this chapter in Figure 9-1.

Creating the matrix initially involves brainstorming. You can get ideas for what to put in your risk matrix from multiple sources:

  • Collective wisdom of the developers

  • Known high-support areas

  • Known threat vectors or vulnerabilities

  • Known areas where the system is incomplete or missing capabilities

  • Known poor performance ...

Get Architecting for Scale, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.