Chapter 8. Defining an Incident and Its Lifecycle

How do we know what an incident is or when it is appropriate to perform a post-incident review?

An incident is any unplanned event or condition that places the system in a negative or undesired state. The most extreme example of this is a complete outage or disruption in service. Whether it is an ecommerce website, a banking app, or a subcomponent of a larger system, if something has happened causing the operability, availability, or reliability to decrease, this is considered an incident.

In short, an incident can be defined as an unplanned interruption in service or a reduction in the quality of a service.

Severity and Priority

A standard classification of incidents helps teams and organizations share the same language and expectations regarding incident response. Priority levels as well as an estimation of severity help teams understand what they are dealing with and the appropriate next steps.

Priority

To categorize types of incidents, their impact on the system or service, and how they should be actively addressed, a priority level is assigned. One common categorization of those levels is:

  • Information

  • Warning

  • Critical

Severe incidents are assigned the critical priority, while minor problems or failures that have well-established redundancy are typically identified with a warning priority. Incidents that are unactionable or false alarms have the lowest priority and should be identified simply as information to be ...

Get Post-Incident Reviews now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.