Chapter 6. How to Apply SRE Principles Without Dedicated SRE Teams

Often, mid-sized organizations find themselves in a position in which a relatively small number of engineers must develop and run a relatively large number of diverse features.

SoundCloud has grown into exactly that situation. With each new feature added to the original monolithic Ruby on Rails code base, adding the next feature became more difficult. So around 2012, we began a gradual move to a microservices architecture. SoundCloud engineers have talked a lot about the various challenges that needed to be tackled for such a move to succeed.1 In this chapter, we explore lessons learned from reliably running hundreds of services at SoundCloud with a much smaller number of engineers.

SREs to the Rescue! (and How They Failed)

In 2012, SoundCloud happened to hire a couple of former Google SREs. Although dramatically smaller in scale, SoundCloud was moving toward technological patterns not so different from what larger internet companies had been doing for a while. By extension, it was an obvious move to also run those systems in the same way Google does. We tried “SRE by the book,” except that back then there was no actual book.

A Matter of Scale in Terms of Headcount

What is the smallest reasonable size of an SRE team? Because SREs ought to be on-call, the team needs to be large enough for at least one on-call rotation. Following the best practices for on-call ...

Get Seeking SRE now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.