Measurement, risk mitigation, crisis response, and managing long-term follow-up are well-accepted parts of SRE and broadly of the discipline of software operations. We do design reviews to head issues off early, measure key service indicators, use incident management structures to manage complexity during outages, write postmortems, and guide our future work, based on what we learn. Our focus on measurement and data enables us to better advocate for users.
The collaborative, multidisciplinary approach of SRE involves coordinating many different stakeholders, and requires individuals to work together to avoid becoming overwhelmed by complexity or burned out. Managing human factors is often the most important skill that SREs learn.
But our job as engineers does not stop purely with adherence to Service-Level Objectives (SLOs). A service that does a reliable job of harming people, exacerbating injustices, or excluding marginalized groups is not a service worth building and maintaining. Technology is poised to change the world, for good or for ill, and engineers of all kinds share a responsibility to ensure that their work is “for the public good” and “does not diminish quality of life, diminish privacy, or harm the environment.”1
Therefore, we must turn our attention to how we ensure that our work is just and serves the public interest. Fortunately, successful activism for social change ...