6

Recovering from Production Failures

We live in an imperfect world. We first see bugs escape into our production environment. Then, we may find as we start moving to DevOps practices, there are gaps in our understanding that affect how we deliver in our production environment. As we get those fixed, we may encounter other problems that are outside our control. What can we possibly do?

In this chapter, we will examine mitigating and dealing with failures that happen in production environments. We will look at the following topics:

  • The costs of errors in production environments
  • Preventing as many errors as we can
  • Practicing for failures using chaos engineering
  • Resolving incidents in production with an incident management process
  • Looking at ...

Get SAFe® for DevOps Practitioners now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.