Chapter 44. Integrating Empathy into SRE Tools
Daniella Niyonkuru
Site reliability engineering includes best practices such as building self-healing services, implementing automatic systems, and watching the quality and quantity of on-call shifts. Yet, we hardly have tools for site reliability engineers that promote self-healing from operational exhaustion, relieve incident-related stress, and track on-call rotations.
Compassionate empathy can help us reach this objective by acting on the elements that make burnout more likely to occur. Building compassionate empathy into software requires understanding (and sometimes collecting) the elements that are often at the center of SRE distress and encoding-related alleviation measures.
These steps support integrating an empathetic approach:
Understand the source.
Find the right metrics (SLIs).
Fix an acceptable range (SLOs).
Draw the consequences (SLAs).
Implement tooling to track SLIs, check SLOs, and enforce SLAs.
Let’s illustrate these steps with an example. Shuri is an SRE at SuperSonicSystems, and a year ago her team was revamped along with their on-call rotation. This resulted in her taking stress leave. Let’s apply our approach to ensure that this does not happen to the rest of the SRE team.
To understand the source, an investigative survey was sent to Shuri. The results show that she was on call more often and encountered ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access