Chapter 30. Against On-Call: A Polemic

On-call, as we know it, must end. It is damaging to the people who do it,1 and inefficient as a means of keeping systems running. It is especially galling to see it continuing today given the real potential we have at this historical moment to eliminate it. The time for a reevaluation of what we do when we are on call and, more important, why we do it, is long overdue.

How long overdue? I can find evidence of on-call-style activities more than 75 years ago,2 and in truth there have been people tending computers in emergencies for as long as there have been both computers and emergencies. Yet, though there have been huge improvements in computing systems generally since then,3 the practice of out-of-hours, often interrupt-driven support—more generally called on-call—has continued essentially unaltered from the beginning of computing right through to today. Ultimately, however, whether the continuity is literally from the dawn of computing or whether it is merely from the last few decades, we still have fundamental questions to ask about on-call, the most important of which is why? Why are we still doing this? Furthermore, is it good that we are? Finally, is there a genuine alternative to doing this work in this way? Our profession derives a great deal of its sense of mission, urgency, and, frankly, individual self-worth, from incident response and resolving production problems. It is rare to hear us ask if we ...

Get Seeking SRE now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.