Chapter 72. Optimize for MTTBTB (Mean Time to Back to Bed)
Spike Lindsey
It’s the middle of the night. The loud, distinctive sound from your paging app of choice rudely yanks you from sleep, shortly followed by a call, then a message for good measure: you wouldn’t want to miss a page!
Doing anything after being suddenly woken up is not ideal—dazed, cortisol levels spiking, maybe even some adrenaline—let alone debugging complex systems under pressure. However, this is the reality of being on call for many, because few organizations can invest in follow-the-sun rotations across all their teams, yet operate systems that need 24/7 availability.
Over time, with enough frequency, out-of-hours pages become a source of stress and eventual burnout. The human cost is not trivial. Part of the solution is fixing the causes of pages, but we have to acknowledge that some pages will still happen. Knowing this, how can we best support on-callers and reduce the mental and physical toll of holding a pager?
First, ask, “Will this make sense if you’ve just been woken up?” Even the most experienced, expert on-callers are not operating at full capacity upon interrupted sleep. We must actively reduce the cognitive load of incident response—whether through checklists, runbooks, scripts and tooling, or dashboards—thinking carefully about whether the information provided has enough context to be ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access