If you read much about the practices of Site Reliability Engineering (SRE) or the professional concerns discussed by site reliability engineers (also SRE), you could quickly gain the impression that the central concept is failure, or the results of failures, or having fewer failures, or avoiding failure in distributed computing systems. However, SRE is most productive and valuable when focused on achieving business success than when focused on preventing or mitigating failure. Peter Senge captured some of the key mental shifts that characterize site reliability engineering:
[a] shift of mind from seeing parts to seeing wholes, ... from reacting to the present to creating the future.1
Unlocking the full benefit of SRE involves cultural changes to empower teams to optimize the full-service life cycle toward successfully delivering the business metrics (service level) to delight their users. SRE teams and practices are most effective when aligned with a supportive corporate culture. To get the full value from the SRE work in your company, it is also important to ensure that site reliability is considered proactively throughout the life cycle of services—from ideation through retirement.
The idea of SRE coalesced sometime around 2003 to 2007. This is roughly the same period when the idea of the “DevOps” movement came into being, and both originated in the tech sector of Silicon Valley. Although ...