Skip to Content
Site Reliability Engineering
book

Site Reliability Engineering

by Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff
April 2016
Intermediate to advanced
552 pages
15h 44m
English
O'Reilly Media, Inc.
Audiobook available
Content preview from Site Reliability Engineering

Chapter 13. Emergency Response

Things break; that’s life.

Regardless of the stakes involved or the size of an organization, one trait that’s vital to the long-term health of an organization, and that consequently sets that organization apart from others, is how the people involved respond to an emergency. Few of us naturally respond well during an emergency. A proper response takes preparation and periodic, pertinent, hands-on training. Establishing and maintaining thorough training and testing processes requires the support of the board and management, in addition to the careful attention of staff. All of these elements are essential in fostering an environment in which teams can spend money, time, energy, and possibly even uptime to ensure that systems, processes, and people respond efficiently during an emergency.

Note that the chapter on postmortem culture discusses the specifics of how to write postmortems in order to make sure that incidents that require emergency response also become a learning opportunity (see Chapter 15). This chapter provides more concrete examples of such incidents.

What to Do When Systems Break

First of all, don’t panic! You aren’t alone, and the sky isn’t falling. You’re a professional and trained to handle this sort of situation. Typically, no one is in physical danger—only those poor electrons are in peril. At the very worst, half of the Internet is down. So take a deep breath…and carry on.

If you ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Site Reliability Engineering Fundamentals

Site Reliability Engineering Fundamentals

Emil Stolarsky, Jaime Woo
Observability Engineering

Observability Engineering

Charity Majors, Liz Fong-Jones, George Miranda
The Site Reliability Workbook

The Site Reliability Workbook

Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara, Stephen Thorne
AI Engineering

AI Engineering

Chip Huyen

Publisher Resources

ISBN: 9781491929117Errata Page