Chapter 88. That 50% Thing
Tanya Reilly
The traditional model for operations was that software engineers would “throw services over the wall” to a dedicated team that would make them work in production. Systems administrators used heroics to keep their sites up while they automated away the jagged edges. Firefighting was just part of the job.
Site reliability brought us a new model. With reliability as a first-class feature, the teams running production expected the same status—and the same salary—as the teams creating the features that ran there. One manifestation of that was the rule that SREs spend no more than 50% of their time on ops work. When I began my first SRE role in 2006, that meant every SRE should spend 50% of their time coding.
However, when you’re running services in production, there’s always ops work to be done. Something is close to its scaling limits. Something is having mysterious, ephemeral outages. Something is a monster to deploy. SREs who weren’t drawn to coding, or who were motivated by solving problems (a common ops personality type) struggled to ignore the interrupts for long enough to ship meaningful coding projects.
Over time, “at least 50% code” became “at most, 50% ops.” And, honestly, that’s fine. As an industry, we’ve often over-emphasized (and over-interviewed for) code. It’s mature to evolve “50% code” into “50% deliberate project ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access