Chapter 33. Lessons Learned from Other Industries

A deep dive into SRE culture and practices at Google naturally leads to the question of how other industries manage their businesses for reliability. Compiling this book on Google SRE created an opportunity to speak to a number of Google’s engineers about their previous work experiences in a variety of other high-reliability fields in order to address the following comparative questions:

  • Are the principles used in Site Reliability Engineering also important outside of Google, or do other industries tackle the requirements of high reliability in markedly different ways?

  • If other industries also adhere to SRE principles, how are the principles manifested?

  • What are the similarities and differences in the implementation of these principles across industries?

  • What factors drive similarities and differences in implementation?

  • What can Google and the tech industry learn from these comparisons?

A number of principles fundamental to Site Reliability Engineering at Google are discussed throughout this text. To simplify our comparison of best practices in other industries, we distilled these concepts into four key themes:

  • Preparedness and Disaster Testing

  • Postmortem Culture

  • Automation and Reduced Operational Overhead

  • Structured and Rational Decision Making

This chapter introduces the industries that we profiled and the industry veterans we interviewed. We define key ...

