Chapter 13. Production Engineering at Facebook

David: What’s production engineering?

Pedro: Philosophically, production engineering stems from the belief that operational problems should be solved through software solutions and that the engineers who are actually building the software are the best people to operate that software in production.

In the early days of software, a developer who wrote the code also debugged and fixed it. Sometimes, they even had to dive into hardware issues. Over the years, with the advent of remote software systems, the internet, and large data centers, this practice changed dramatically. Today, it’s still common to see software engineers writing and developing applications, then handing off their code to a QA team for testing, and then handing that off to another team for deploying and debugging. In some environments, a release engineering team is responsible for deploying code and an operations team ensures the system is stable and responds to alerts. This works fairly well when QA and operations have the knowledge required to fix problems, and when the feedback loops between the teams are healthy. When this isn’t the case, fixing and/or debugging production issues needs to work its way back to the software engineers, and this workflow can significantly delay fixes. At Facebook, our production engineering [PE] team is simply bringing back the concept of integrating software engineering ...

Get Seeking SRE now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.