Skip to Content
Effective SRE: On-call Best Practices
on-demand course

Effective SRE: On-call Best Practices

with Jaime Woo, Emil Stolarsky
December 2021
Intermediate
1h 38m
English
O'Reilly Media, Inc.
Closed Captioning available in German, English, Spanish, French, Japanese, Korean, Portuguese (Portugal, Brazil), Chinese (Simplified), Chinese (Traditional)

Overview

As the world has moved online, we’ve grown to expect that everything works around the clock. Organizations are investing more and more into maintaining their systems, searching high and low for ways to make them more reliable. Yet a key resource is hiding in plain sight: your people. Your developers and operators are the last line of defense against getting your systems back online, but we give the practice of on-call barely any thought.

Designing on-call looks deceptively easy and is often done ad hoc. But ineffective on-call design can lead to slower incident response and diminished well-being for those on-call, including burnout and attrition. Effective and sustainable on-call, on the other hand, yields substantive benefits and helps operators learn about their systems and improve how they support them.

Experts Jaime Woo and Emil Stolarsky guide you through the key components of on-call, from training, scheduling, and rotations to incident response and evaluation. On-call is an accepted part of an operator's life, and being intentional about it is the best way to ensure that the team stays healthy and sustainable. Join in to learn how it’s done.

What you’ll learn and how you can apply it

By the end of this recording of a live online course, you’ll understand:

  • When you need to establish on-call
  • How to create playbooks
  • How to create healthy on-call rotations
  • How companies can fight pager fatigue
  • Whether or not to compensate workers for being on-call
  • The tools you can use to help with on-call

And you’ll be able to:

  • Follow best practices for on-call to build a healthy and sustainable culture
  • Evaluate if your on-call process is “working”
  • Spot the signs of a poor on-call process
  • Manage stakeholders around on-call culture
This recording of a live event is for you because…
  • You’re a developer, operator, or manager involved with on-call.
  • You work with systems that require on-call support.
  • You want to become an effective and supportive manager for your people.

Prerequisites

  • Experience running software in production environments
  • Familiarity with on-call processes

Recommended preparation:

  • Read “Being On-Call” (chapter 11 in Site Reliability Engineering)

Recommended follow-up:

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Watch now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Infrastructure & Ops Hour: SRE with Tammy Butow

Infrastructure & Ops Hour: SRE with Tammy Butow

Sam Newman, Tammy Butow
What Successful Project Managers Do

What Successful Project Managers Do

W. Scott Cameron, Jeffrey S. Russell, Edward J. Hoffman, Alexander Laufer
Coaching for High Performance

Coaching for High Performance

MIT Sloan Management Review
The Human Factor in AI-Based Decision-Making

The Human Factor in AI-Based Decision-Making

Philip Meissner, Christoph Keding

Publisher Resources

ISBN: 0636920668480