This chapter is for leaders in organizations that are not practicing Site Reliability Engineering (SRE). It’s for the IT managers who are facing disruption from the cloud. It’s for operations directors who are dealing with more and more complexity every day. It’s for CTOs who are building new technical capabilities. You might be thinking about building an SRE team. Is it a good fit for your organization? I want to help you decide.
Since 2014, I’ve worked in Google Cloud Platform, and I’ve met at least a hundred leaders in this situation. My first job at Google was before SRE existed, and today I work with cloud customers who are applying SRE in their organizations. I’ve seen SRE teams grow from seeds in many environments. I’ve also seen teams that struggled to find energy and nutrition.
So, let’s assume that you want to build an SRE team. Why? I’ve heard some themes that come up over and over. These made-up quotes illustrate the themes:
“My friends think SRE is cool and it would look nifty on my résumé.”
“I got yelled at the last time we had an outage.”
“My organization wants more predictable reliability and is willing to pay for it.”
They are cheeky distillations of more nuanced situations, but they might resonate with you. Each one points to several opportunities and pitfalls. Understanding them will help you to figure out how SRE would fit into your organization. Let’s take a look at the real meaning of each ...