Chapter 5. Patterns and Antipatterns of SRE
This IS NOT SRE
There are many ways that an attempt to implement SRE practices and teams can go wrong. You can find more on Twitter and in Chapter 23 of Seeking SRE, but here are some key problems to avoid:
-
Changing the name of any existing team (usually “ops”) to “SRE” without making the organizational adjustments required to enable them to do meaningful development work
-
Using the SRE team to shield devs from the pain of how their services really function in production
-
Failing to contain interrupts
-
Attempting to do SRE project work without the same support (such as project managers, technical writers, etc.) that any other dev team would have (because SREs only spend 50% of their time on project work, we contend that support structures are even more important for SRE teams to make effective use of their development time)
-
Valuing (perhaps simply through call-out recognition) incident response heroics over prudent design and preventative planning
-
Implementing processes or systems that slow down the delivery of value to customers without incontrovertible benefit
-
Building a “gatekeeper” team that functions as a chokepoint
-
Static or ill-considered SLOs
-
Thinking that SRE is a point solution to a particular problem rather than a fundamental cultural shift
This IS SRE
Hearkening back to the beginning:
SRE is an organizational model for running reliable online services by teams that are chartered to do reliability-focused ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access