The Site Reliability Workbook
by Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara, Stephen Thorne
Foreword II
When I found out people were working on a second SRE book, I reached out and asked if I could write a few words. The principles from the first SRE book align so well with what I always imagined DevOps to be, and the practices are insightful, even when they aren’t 100% applicable outside of Google. After reading the principles from the first SRE book for the first time—embracing risk (Chapter 3), service level objectives (Chapter 4), and eliminating toil (Chapter 5)—I wanted to shout that message from the rooftops. “Embracing risk” resonated so much because I had used similar language many times to help traditional organizations motivate change. Chapter 6 was always an implicit DevOps goal, both to allow humans more time for creative higher-order work and to allow them to be more human. But I really fell in love with “service level objectives.” I love that the language and the process create a dispassionate contract between operational considerations and delivering new functionality. The SRE, SWE (software engineer), and business all agree that the service has to be up to be valuable, and the SRE solution quantifies objectives to drive actions and priorities. The solution—make the service level a target, and when you are below the target prioritize reliability over features—eliminates a classic conflict between operations and developers. This is a simple and elegant reframing that solves problems by not having them. I give these three chapters as ...