A deep dive into SRE culture and practices at Google naturally leads to the question of how other industries manage their businesses for reliability. Compiling this book on Google SRE created an opportunity to speak to a number of Google’s engineers about their previous work experiences in a variety of other high-reliability fields in order to address the following comparative questions:
Are the principles used in Site Reliability Engineering also important outside of Google, or do other industries tackle the requirements of high reliability in markedly different ways?
If other industries also adhere to SRE principles, how are the principles manifested?
What are the similarities and differences in the implementation of these principles across industries?
What factors drive similarities and differences in implementation?
What can Google and the tech industry learn from these comparisons?
A number of principles fundamental to Site Reliability Engineering at Google are discussed throughout this text. To simplify our comparison of best practices in other industries, we distilled these concepts into four key themes:
Preparedness and Disaster Testing
Automation and Reduced Operational Overhead
Structured and Rational Decision Making
This chapter introduces the industries that we profiled and the industry veterans we interviewed. We define key ...