Chapter 7. Conclusion and Moving Forward
We looked at the basics of incidents, and took a look at the incident management lifecycle: preparedness, response, and recovery. It’s a lot to process, but you might now be wondering, “What’s next?”
Your first call to action is this: learn to use incident management only when it’s appropriate. Incident response is a human-expensive activity. A person, often several people, needs to be involved throughout the drive from initial alerting to resolution. The act of incident response is intended to put in place mitigations that correct problems while they are happening, in order to buy time to make decisions about priorities. This means that regular product fixes might not be rolled out and long-term plans and improvements might not be prioritized. Incident response might mean that SLOs are violated or customer commitments can’t be met. It also means that the employees working on incident response are going to feel it.
It’s well documented that first responders to physical incidents are at heightened risk of burnout and responder fatigue; this same trend also applies to individuals who work on nonphysical incidents—namely, anyone whose job can involve work–life imbalance, extremes of activity, or a possible lack of control. These are common factors in technical incident management jobs, which means that employees can feel the effects and career consequences of burnout. The risks here involve, at best, low performance, and at worst, employee ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access