Chapter 5. How to Use Error Budgets

Error budgets are the final part of the Reliability Stack, and it takes a lot of effort and resources to use them properly. Not every team, organization, or company always gets to this part. It’s not that thinking about error budgets is necessarily difficult or complicated, but actually using them as data to drive decisions will change how things work for many people.

In a software-based work environment, systems are often already in place to coordinate and mandate how work is done—including reliability work. You might follow some version of Agile and have sprints, you might simply have quarterly OKRs you work toward, or you might be an operational team that staffs a 24/7 control center that is tasked primarily with responding to customer requests or tickets. There are many ways that your organization may go about trying to keep things reliable.

Despite seeming straightforward, adopting an error budget approach to reliability can be a shocking change for some people, and it often doesn’t align with the methods and processes you already have in place. This is all fine; remember that the aim of this book is to present a new way of thinking about your users, a new way of collecting data about their experiences, and a new way of having discussions about the state of your services. You don’t have to subscribe to everything described in these pages. Crafting user-focused SLI measurements and using those to decide how to prioritize ...

Get Implementing Service Level Objectives now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.