How SRE Relates to DevOps

class SRE implements interface DevOps

Operations, as a discipline, is hard.1 Not only is there the generally unsolved question of how to run systems well, but the best practices that have been found to work are highly context-dependent and far from widely adopted. There is also the largely unaddressed question of how to run operations teams well. Detailed analysis of these topics is generally thought to originate with Operational Research devoted to improving processes and output in the Allied military during World War II, but in reality, we have been thinking about how to operate things better for millennia.

Yet, despite all this effort and thought, reliable production operations remains elusive—particularly in the domains of information technology and software operability. The enterprise world, for example, often treats operations as a cost center,2 which makes meaningful improvements in outcomes difficult if not impossible. The tremendous short-sightedness of this approach is not yet widely understood, but dissatisfaction with it has given rise to a revolution in how to organize what we do in IT.

That revolution stemmed from trying to solve a common set of problems. The newest solutions to these problems are called by two separate names—DevOps and Site Reliability Engineering (SRE). Although we talk about them individually ...

Get How SRE relates to DevOps now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.