Enterprise Roadmap to SRE

Book description

Two previous O'Reilly books from Google--Site Reliability Engineering and The Site Reliability Workbook--demonstrated how and why a commitment to the entire service life cycle enables your organization to successfully build, deploy, monitor, and maintain software systems. In this detailed report, Google Cloud Reliability Advocate Steve McGhee and Google Cloud Solutions Architect James Brookbank dive deeper into the specific challenges engineers face when adopting SRE in their organization.

Despite SRE's popularity, many enterprises have experienced a significant gap between initial enthusiasm for SRE and its often modest level of adoption. If you're a product owner or have a stake in reliable services and need to know more about SRE adoption, this report will methodically guide you through the process.

  • Get started by evaluating your existing environment and setting expectations
  • Examine SRE's approach to reliability, and learn why reliability is the most desired product feature
  • Learn how to map SRE's guiding principles, such as embracing risk, to your existing organization
  • Develop a set of SRE practices for your team, based on what team members can do, what they know, and what tools they use
  • Learn tips on how to actively nurture success and keep SRE working in your organization

Table of contents

  1. Preface
  2. 1. Getting Started with Enterprise SRE
    1. Evolution Is Better Than Revolution
    2. SRE Practices Can Coexist with the ITIL Framework
    3. DevOps/Agile/Lean
      1. Start Where You Are
      2. Outline Your Expectations and Vision
      3. SRE Starts with People
      4. Embrace Your Uniqueness
  3. 2. Why the SRE Approach to Reliability?
    1. Setting Reliability as a Key Product Differentiator
    2. When to Focus on Reliability?
    3. Why Is SRE Happening Now?
    4. Beyond the Google Halo
    5. Why Not More Traditional Ops?
  4. 3. SRE Principles
    1. Embracing Risk (SRE Book Chapter 3)
    2. Service-Level Objectives (SRE Book Chapter 4)
    3. Eliminating Toil (SRE Book Chapter 5)
    4. Monitoring Distributed Systems (SRE Book Chapter 6)
    5. The Evolution of Automation at Google (SRE Book Chapter 7)
    6. Release Engineering (SRE Book Chapter 8)
    7. Simplicity (SRE Book Chapter 9)
    8. How Do You Map These Principles to Your Existing Organization?
    9. Preventing Org-Destroying Mistakes
    10. Create a Safe-to-Fail Environment for Your Adoption Journey
    11. Beware Diverging Priorities
    12. How Do You Get Buy-In to These Principles, with the Critical Sign-Off and Backing You Need?
  5. 4. SRE Practices
    1. Where to Start?
    2. Where Are You Going?
    3. How to Get There
    4. What Makes SRE Possible?
    5. Building a Platform of Capabilities
    6. Leadership
      1. Knowing If It Is Working
      2. Choosing to Invest in Reliability
      3. Making Decisions
    7. Staffing and Retention
    8. Upskilling
  6. 5. Actively Nurturing Success
    1. Think Big, Act Small
    2. Culture Eats Strategy for Breakfast
    3. Avoiding Culture Won’t Help; Neither Will Waiting for It
    4. What Does Nurturing SRE Mean?
      1. 1. Sublinear Scaling
      2. 2. Building and Retaining Sustainable, Happy Teams
      3. 3. Acknowledging That Sre Is Not Static—It’s Inherently a Dynamic Role, and Grows over Time
      4. 4. Assessing Your Reliability Mindset Level and Target Within Your Organization
    5. SRE Care and Feeding
      1. Growing a Foothold Team into a Larger Org
      2. SRE org structure: Separate SRE Org Versus Embedded Teams
      3. Promotion, Training, and Compensation
      4. Communication and Community Building
      5. Gauging When Your SRE Adoption Is Effective
      6. Steering the Ship
  7. 6. Not Just Google
    1. Healthcare // Joseph
    2. Retail // Kip and Randy
  8. Conclusion
  9. About the Authors

Product information

  • Title: Enterprise Roadmap to SRE
  • Author(s): James Brookbank, Steve McGhee
  • Release date: January 2022
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098117733