Skip to Content
Incident Metrics in SRE
book

Incident Metrics in SRE

by Stepan Davidovic
March 2021
Intermediate to advanced
34 pages
52m
English
O'Reilly Media, Inc.

Overview

Site reliability engineers often use MTTx metrics to evaluate improvements or track trends. But is either MTTR (mean time to recovery) or MTTM (mean time to mitigation) ideal for decision making or trend analysis when it comes to production incidents? This report not only demonstrates how and why MTTx metrics come up short but also proposes ways to think about metrics differently to get the answers you want.

Google SRE Google SRE Stepan Davidovic uses a Monte Carlo simulation to show how MTTx metrics are poorly suited for decision making or trend analysis in the context of production incidents. Applying these metrics is trickier than it seems and can be dangerously misleading in many practical scenarios. With this report, you'll explore alternative methods for achieving these measurements.

  • Work with a simple model of the incident lifecycle and timings using empirical datasets
  • Use an analytical approach to get a clear picture of what your incident durations look like
  • Focus on narrow questions of the incident lifecycle rather than analyze incident statistics using MTTx
  • Explore alternative methods for achieving your measurements
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

SRE サイトリライアビリティエンジニアリング ―Googleの信頼性を支えるエンジニアリングチーム

SRE サイトリライアビリティエンジニアリング ―Googleの信頼性を支えるエンジニアリングチーム

Betsy Beyer, Chris Jones, Jennifer Petoff, Niall Richard Murphy, 澤田 武男, 関根 達夫, 細川 一茂, 矢吹 大輔, Sky株式会社 玉川 竜司
Reducing MTTD for High-Severity Incidents

Reducing MTTD for High-Severity Incidents

Tammy Bütow, Michael Kehoe, Jay Holler, Rodney Lester, Ramin Keene, Jordan Pritchard

Publisher Resources

ISBN: 9781098103163