Chapter 12. A Worked Example

At this point, you’ve learned a lot about SLO-based approaches to reliability. Assuming you’ve read all of Part I, you now have an understanding of how the entire process works and the various components of the Reliability Stack. If you’ve explored other parts of Part II, you’ve also potentially learned about getting buy-in, how to actually measure things and use them for alerting and monitoring, some of the statistics and probabilities you can use to pick good SLI measurements and SLO targets, how to architect your services with SLOs in mind from the start, and why data reliability is a special case that requires different conversations.

While the other chapters in this part of the book have given you lots of detailed insight into specific aspects of an SLO-based approach to reliability, and Part I outlined and defined all of the concepts you need to get started, what we really haven’t talked about yet is how all this might actually work for a multicomponent service—or how it might apply to an entire company or organization. Consider this chapter as a way to put a lot of these concepts to work.

This chapter describes an example company and walks through defining SLIs and SLOs for various parts of its infrastructure. Looking at a concrete example can be useful when learning how to apply concepts that may have just been abstract in your reading so far. We’ll be covering everything from a customer-facing web page to services ...

Get Implementing Service Level Objectives now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.