Chapter 8. Defining an Incident and Its Lifecycle
How do we know what an incident is or when it is appropriate to perform a post-incident review?
An incident is any unplanned event or condition that places the system in a negative or undesired state. The most extreme example of this is a complete outage or disruption in service. Whether it is an ecommerce website, a banking app, or a subcomponent of a larger system, if something has happened causing the operability, availability, or reliability to decrease, this is considered an incident.
In short, an incident can be defined as an unplanned interruption in service or a reduction in the quality of a service.
Severity and Priority
A standard classification of incidents helps teams and organizations share the same language and expectations regarding incident response. Priority levels as well as an estimation of severity help teams understand what they are dealing with and the appropriate next steps.
Priority
To categorize types of incidents, their impact on the system or service, and how they should be actively addressed, a priority level is assigned. One common categorization of those levels is:
-
Information
-
Warning
-
Critical
Severe incidents are assigned the critical priority, while minor problems or failures that have well-established redundancy are typically identified with a warning priority. Incidents that are unactionable or false alarms have the lowest priority and should be identified simply as information to be ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access