Chapter 28. SRE Cognitive Work

irony that the more advanced a control system is, so the more crucial may be the contribution of the human operator.

Bainbridge, 1983.


The modern “system” is a constantly changing melange of hardware and software embedded in a variable world. Together, the hyperdistribution, fluctuant composition, constantly varying workload, and continuous modification of modern technology assemblies comprises a unique challenge to those who design, maintain, diagnose, and repair them. We are involved in exploring this challenge and trying to understand how people are able to keep our systems working and, in particular, how they make sense out of what is happening around them. What we find is both inspiring and worrisome. Inspiring because the studies reveal highly refined expertise in people and groups along with novel mechanisms for bringing that expertise to bear. Worrisome because the technology and organization are so often poorly configured to make this expertise effective.

Together with our colleagues, we have studied people doing SRE work, the problems they face, the approaches they take, and the issues that arise in the middle. From a distance this work is often imagined as narrowly technical, even mundane. Examining the work as done, in contrast, reveals that SRE work is often stormy and sometimes dangerous.

This chapter gives a brief overview of what we think we now know about modern ...

Get Seeking SRE now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.