Operations is a set of promises and the work it takes to fulfill it. In Chapter 2, we discussed how to create, monitor, and report on them. Risk management is what we do to identify, assess, and prioritize the uncertainties that could cause us to violate these promises we’ve made. It is also the application of resources (technology, tools, people, and processes) to monitor, mitigate, and reduce the probability of these uncertainties coming to pass.
This is not a perfect science! The goal of this is not to eliminate all risks. That is a quixotic goal that will waste resources. The goal is to bake the assessment and mitigation of risk into all of our processes and to iteratively reduce the impact of risks through mitigation and prevention techniques. This process should be continually performed with inputs from observation of incidents, introduction of new architectural components, and the increased or decreased impact as an organization evolves. The cycle of this process can be broken down into seven categories:
Identify possible hazards/threats that create operational risk to the service
Conduct assessment of each risk, looking at likelihood and impacts
Categorize the likelihood and outcome of the risks
Identify controls for mitigating consequences or reducing likelihood of the risk
Prioritize which risks to tackle first
Implement controls and monitor effectiveness
By repeating this process, you are exercising Kaizen, or continuous improvement. ...