Chapter 40. Effortless Incident Management

Suhail Patel, Miles Bryant, and Chris Evans

Humans are one of the most important factors in incident management processes, and that’s no different for incidents that SREs will become involved in. Managed incorrectly, an incident can have too many parallel conflicting streams (individuals stepping on top of each other) or not enough collaboration (individuals trying to resolve the incident on their own). Here are the key steps to achieve effortless incident management

The first thing to do in all incidents is nominate an incident lead and make it clear to everyone in the incident who the lead is at all times. This is the individual tasked with coordinating the roles and responsibilities of everyone involved in the incident and delegating tasks. The incident lead doesn’t have to be the person most familiar with the systems affected; rather, it can be someone who can bring the right groups of people together. The lead does not need to remain static throughout the incident; another individual can take on the incident lead role once they’ve gained all the context needed.

Consider setting up a dedicated incident communication channel (in ...

Get 97 Things Every SRE Should Know now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.