Chapter 7. The Approach: Facilitating Improvements
There are two main philosophical approaches to both what the analysis is and the value it sets out to provide organizations. For many, its purpose is to document in great detail what took place during the response to an IT problem (self-diagnosis). For others, it is a means to understand the cause of a problem so that fixes can be applied to various aspects of process, technology, and people (self-improvement).
Regardless of the approach you take, the reason we perform these exercises is to learn as much about our systems as possible and uncover areas of improvement in a variety of places. Identifying a “root cause” is a common reason most claim as to why analysis is important and helpful. However, this approach is shortsighted.
The primary purpose of a post-incident review is to learn.
Discovering Areas of Improvement
Problems in IT systems arise in many different forms. In my opening example in the Introduction, our system did not experience an outage, but rather an unexpected (and bad) outcome as a result of normal operation. The backup process was not performing as it was expected to, but the system as a whole did not suffer a disruption of service. All signs pointed to a healthy working system. However, as we learned from the remediation efforts and post-incident review, there were latent problems in the migration process that caused a customer to lose data. Despite not directly disturbing the availability of our system, ...