Chapter 9. Establishing a Relationship to Toil

Given how often the subject of toil comes up in the SRE context, it is remarkable how murky the topic remains, how imprecise we are in our writing and conversations on the topic, and how disconnected it is from our development and operations practices. Some of the best writing on the subject can be found in Vivek Rau’s Chapter 5, “Eliminating Toil,” in Site Reliability Engineering, and in Chapter 6 of The Site Reliability Workbook (O’Reilly, 2018) by a larger cast of authors. If you have not read both of those chapters yet (both are free online for all to access), I strongly encourage you to do so before proceeding with this one.

In this chapter, we are going to avoid the rehash of those chapters that most articles on toil undertake and instead focus on ways SREs can establish a nuanced and healthy relationship to toil once they have read those fundamentals. To do that, I’ll quickly quote the definition and then use it as a springboard to start our exploration.

The first step in this process is increasing the precision of how we describe toil when discussing it. There’s a good reason why some cultures have a substantial vocabulary around the weather patterns we file under the heading of snow. Different manifestations of snow can necessitate different responses or at least produce different experiences for people. Being able to speak about toil in a nuanced way gives us more options for how to respond to it.

Defining Toil with More ...

Get Becoming SRE now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.