Implementing Service Level Objectives
Published by O'Reilly Media, Inc.
How to make SLIs, SLOs, and error budgets work for you
Service-level objectives (SLOs)—the bedrock upon which the discipline of site reliability engineering (SRE) was built—have never been more popular. But it can be difficult to find practical advice that helps you actually get started. And while the concepts are easy to learn, it turns out that actually putting them into practice takes much more work than most people realize.
Expert Alex Hidalgo introduces you to an SLO-based approach to reliability and walks you through real-world example applications—showing you how to get started on your SLO journey right away. Learn how to do SLOs the right way to get the data you need to make better decisions, understand your services better, increase your release cadence, and end up with happier customers.
What you’ll learn and how you can apply it
By the end of this live online course, you’ll understand:
- What SLIs, SLOs, and error budgets are
- Why this philosophy is essential to adopting site reliability engineering
- How this approach can lead to happier engineers, happier users, and a happier business
And you’ll be able to:
- Pick meaningful SLI measurements
- Choose good SLO targets
- Use error budgets to drive decision making
- Increase your release cadence
- Report on reliability to leadership in a more cohesive manner
This live event is for you because...
- You’re an engineer on the front lines and care about the reliability of your service.
- You’re a product manager who wants to see a quicker release cadence.
- You’re a member of leadership who wants to see better reporting on the reliability of your products and services.
Prerequisites
- A basic understanding of web-based computer services, including the concepts of microservices, APIs, load balancers, databases, and other common pieces of modern computer service architectures
Recommended preparation:
- Follow and explore Microservices Essentials (expert playlist)
Recommended follow-up:
- Read Implementing Service Level Objectives (book)
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
Reliability (50 minutes)
- Presentation: The reliability stack—an overview of how SLO-based approaches to reliability work; how all the parts work together
- Group discussion: How do you currently think about reliability?; What does reliability mean?; How do users think about reliability differently than engineers?
- Break (5 minutes)
Meaningful SLIs, good SLOs, and effective error budgets (50 minutes)
- Presentation: Developing meaningful SLIs; thinking about risk—what other engineering disciplines have already figured out about risk; choosing good SLOs—the math and basic statistics behind how to choose good targets; how to use error budgets
Wrap-up and Q&A (15 minutes)
Your Instructor
Alex Hidalgo
Alex Hidalgo is principal reliability advocate at Nobl9 and author of Implementing Service Level Objectives. During his career, he’s developed a deep love for sustainable operations, proper observability, and using SLO data to drive discussions and make decisions. Alex's previous jobs have included IT support, network security, restaurant work, T-shirt design, and hosting game shows at bars. When not sharing his passion for technology with others, he can be found scuba diving or watching college basketball. He lives in Brooklyn with his partner Jen and a rescue dog named Taco. Alex has a BA in philosophy from Virginia Commonwealth University.