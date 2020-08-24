Book description
Although service-level objectives (SLOs) continue to grow in importance, there’s a distinct lack of information about how to implement them. Practical advice that does exist usually assumes that your team already has the infrastructure, tooling, and culture in place. In this book, recognized SLO expert Alex Hidalgo explains how to build an SLO culture from the ground up.
Ideal as a primer and daily reference for anyone creating both the culture and tooling necessary for SLO-based approaches to reliability, this guide provides detailed analysis of advanced SLO and service-level indicator (SLI) techniques. Armed with mathematical models and statistical knowledge to help you get the most out of an SLO-based approach, you’ll learn how to build systems capable of measuring meaningful SLIs with buy-in across all departments of your organization.
- Define SLIs that meaningfully measure the reliability of a service from a user’s perspective
- Choose appropriate SLO targets, including how to perform statistical and probabilistic analysis
- Use error budgets to help your team have better discussions and make better data-driven decisions
- Build supportive tooling and resources required for an SLO-based approach
- Use SLO data to present meaningful reports to leadership and your users
Publisher resources
Table of contents
- Foreword
- Preface
- I. SLO Development
- 1. The Reliability Stack
- 2. How to Think About Reliability
-
3. Developing Meaningful Service Level Indicators
- What Meaningful SLIs Provide
- Caring About Many Things
- Something More Complex
- Summary
-
4. Choosing Good Service Level Objectives
- Reliability Targets
- Service Dependencies and Components
- Reliability for Things You Don’t Own
- Choosing Targets
- Summary
-
5. How to Use Error Budgets
- Error Budgets in Practice
- Error Budget Measurement
- Summary
- II. SLO Implementation
- 6. Getting Buy-In
-
7. Measuring SLIs and SLOs
- Design Goals
- Common Machinery
- Common Cases
- The General Case
- Other Considerations
- Summary
-
8. SLO Monitoring and Alerting
- Motivation: What Is SLO Alerting, and Why Should You Do It?
- How to Do SLO Alerting
- Parting Recommendations
- Summary
- 9. Probability and Statistics for SLIs and SLOs
-
10. Architecting for Reliability
-
Example System: Image-Serving Service
- Architectural Considerations: Hardware
- Architectural Considerations: Monolith or Microservices
- Architectural Considerations: Anticipating Failure Modes
- Architectural Considerations: Three Types of Requests
- Systems and Building Blocks
- Quantitative Analysis of Systems
- Instrumentation! The System Also Needs Instrumentation!
- Architectural Considerations: Hardware, Revisited
- SLOs as a Result of System SLIs
- The Importance of Identifying and Understanding Dependencies
- Summary
- Example System: Image-Serving Service
- 11. Data Reliability
- 12. A Worked Example
- III. SLO Culture
-
13. Building an SLO Culture
- A Culture of No SLOs
- Strategies for Shifting Culture
- Path to a Culture of SLOs
- Summary
-
14. SLO Evolution
- SLO Genesis
- Usage Changes
- Dependency Changes
- Failure-Induced Changes
- User Expectation and Requirement Changes
- Tooling Changes
- Intuition-Based Changes
- Setting Aspirational SLOs
- Identifying Incorrect SLOs
- How to Change SLOs
- Summary
- 15. Discoverable and Understandable SLOs
-
16. SLO Advocacy
- Crawl
- Walk
- Run
- Summary
- 17. Reliability Reporting
- A. SLO Definition Template
- B. Proofs for Chapter 9
- Index
Product information
- Title: Implementing Service Level Objectives
- Author(s):
- Release date: August 2020
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492076810
You might also like
book
Codeless Data Structures and Algorithms : Learn DSA Without Writing a Single Line of Code
In the era of self-taught developers and programmers, essential topics in the industry are frequently learned …
book
Python for DevOps
Much has changed in technology over the past decade. Data is hot, the cloud is ubiquitous, …
book
Kafka: The Definitive Guide
Every enterprise application creates data, whether it’s log messages, metrics, user activity, outgoing messages, or something …
book
Python Crash Course, 2nd Edition
This is the second edition of the best selling Python book in the world. Python Crash …