Skip to Content
Modern System Administration
book

Modern System Administration

by Jennifer Davis
November 2022
Intermediate to advanced
325 pages
8h 13m
English
O'Reilly Media, Inc.
Content preview from Modern System Administration

Chapter 20. Managing Incidents

As we explored in Chapter 19, the purpose of on-call is to be aware of your systems so you can keep them healthy. But as much as you strive to reduce risk, failure will happen—there will be incidents. Incident management begins when you detect a problem during an on-call rotation, but management often extends beyond on-call when other subject matter experts and teams are required for issue resolution. The aim of incident management is to minimize the impact of an incident.

You, as an individual, need the kinds of tools, techniques, and practices that will not only get you through an incident with minimal suffering but will also help you feel prepared ahead of time and able to react effectively when an incident occurs. You need good, clear communication across teams so that the appropriate subject matter experts can share their knowledge and minimize time to resolution. And you need a way to capture and apply what you learned from the incident to improve overall production, reduce future impacts to customers, and reduce the team’s toil.

In this chapter, I share the framework for collaborative and sustainable incident management from identifying incidents to conducting post-incident reviews and identifying the actions required to improve the live environment.

Note

I am assuming your team has incident management and that you’ll have some framework to which you can apply what I’m sharing to improve your experience. If your team doesn’t currently do incident ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Practical Linux System Administration

Practical Linux System Administration

Kenneth Hess
UNIX and Linux System Administration Handbook, 5th Edition

UNIX and Linux System Administration Handbook, 5th Edition

Trent R. Hein, Evi Nemeth, Garth Snyder, Ben Whaley, Dan Mackin

Publisher Resources

ISBN: 9781492055204Errata Page