6

Operational Framework – Managing Infrastructure and Systems

There’s some confusion regarding the operational nature of site reliability engineering. For instance, we hear that site reliability engineers (SREs) exclusively work on automating toil or that they only manage the observability platforms that are available. Such statements cannot be true, as they defeat the very reason why we need SREs. SREs need to do operational work to handle system weaknesses, single points of failure, technical debt, performance issues, and risks. Furthermore, by getting to know them, they also fix these issues through operational work. Gene Brown, a distinguished engineer and global site reliability engineering leader at Kyndryl, once said that “SREs need to ...

Get Becoming a Rockstar SRE now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.