Chapter 18 Conducting Post-Incident Reviews

Chapter 18

Conducting Post-Incident Reviews

IN THIS CHAPTER

Bullet Moving beyond the limits of root cause analysis

Bullet Stepping through the phases of an incident

Bullet Reviewing contributing factors in post-incident reviews

Engineers are much more practiced at reacting to incidents than they are to proactively preparing to manage and avoid them. Post-incident reviews aim to empower engineers to look at the causes of an incident, the steps taken while responding to an incident, and the steps necessary to avoid a comparable incident in the future.

People used to refer to post-incident reviews as postmortems, and you can still find a lot of valuable information if you search for this term. However, the word is a bit morbid with its connotation of death. For most software engineers, outages mean inconvenience to customers and loss of company money. Few engineers deal with life-and-death situations in the use of their products, and keeping that perspective in mind when addressing failures is important.

In this chapter, you dive into the contributing factors of failure (going beyond root cause analysis), the phases of an incident or outage, and the way ...

Get DevOps For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.