Chapter 17. Growing SRE in Your Org

In this chapter, we are going to talk about scaling. Not the usual scaling of infrastructure or services we usually discuss in SRE, but the scaling of people. We are going to talk about how SRE might go from zero (or part of a single person’s time) to a much larger presence in your organization.

I say might in the previous sentence because I think of SRE (and operations in general) as highly situational. I believe that the same seed can grow very differently in different soil. As a result, this chapter will be less about prescriptive advice and more about describing some of the more common patterns I have seen work for different organizations. The hope here is that you will be able to choose options from this menu that feel congruent with your existing organization.

How Do You Know When to Scale?

Before we get into actual numbers that are going to increase as the chapter goes on, I want to call out and question the implicit assumption that “scaling bigger is better.” It is very easy to read into (and quite frankly, write) this chapter as if the ultimate goal is to grow an SRE org to the maximum size the budget will allow. Just like we discuss “appropriate levels of reliability” throughout this book, there are also appropriate levels of scaling SRE.

For example, it may be tempting to grow or split a team based primarily on the load on that team demonstrated by a rise in the number of tickets or pages that team is expected to handle, but that ...

Get Becoming SRE now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.