Skip to Content
The Site Reliability Workbook
book

The Site Reliability Workbook

by Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara, Stephen Thorne
July 2018
Intermediate to advanced content levelIntermediate to advanced
506 pages
13h 58m
English
O'Reilly Media, Inc.
Book available
Content preview from The Site Reliability Workbook

Appendix C. Results of Postmortem Analysis

At Google, we have a standard postmortem template that allows us to consistently capture the incident root cause and trigger, which enables trend analysis. We use this trend analysis to help us target improvements that address systemic root-cause types, such as faulty software interface design or immature change deployment planning. Table C-1 shows the breakdown of our top eight triggers for outages, based on a sample of thousands of postmortems over the last seven years.

Table C-1. Top eight outage triggers, 2010–2017
Binary push 37%
Configuration push 31%
User behavior change 9%
Processing pipeline 6%
Service provider change 5%
Performance decay 5%
Capacity management 5%
Hardware 2%

Table C-2 presents the top five contributing root-cause categories.

Table C-2. Top five root-cause categories for outages
Software 41.35%
Development process failure 20.23%
Complex system behaviors 16.90%
Deployment planning 6.74%
Network failure 2.75%
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Site Reliability Engineering

Site Reliability Engineering

Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff

Publisher Resources

ISBN: 9781492029496Errata Page