Preface
Site Reliability Engineering (SRE) is a broad field that is quickly and constantly evolving as more organizations outside of Google implement and refine SRE practices to align the needs of their services, customers, and business objectives. Although Google shaped many of the core tenets of SRE, we are excited by this next chapter and what we can learn from others’ experiences.
We conducted the SLO Adoption and Usage Survey to get a snapshot of where organizations are in adopting SRE practices. We focused much of the survey on how organizations use service level objectives (SLOs), which we have found to be the driving force behind what makes SRE an effective framework for managing services and ensuring user happiness.
This report aims to provide insight into how Site Reliability Engineers (SREs) identify, build, and measure the effectiveness of SLOs and how organizations use the collected data to improve the reliability of their services. It also identifies gaps between usage and SRE best practices—information that can help organizations recognize opportunities to improve their own SRE practices.
The purpose of this report is to share the survey results with the industry and further the conversation about how and why to adopt an SLO- and error-based approach to managing your services. The first part of this report reviews the results of our survey and identifies areas in which organizations can strengthen their SRE practices in order to realize the full benefits of implementing ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access