Chapter 6. Improving Visibility

Even with transparent communication, you’re going to encounter roadblocks and issues both during the development and the operational phases of every initiative. Issues can stem from inadequate processes, bugs, edge cases, or unforeseen user behavior. As the group responsible for managing applications and systems in production, the question that your group should ask is, “How can we minimize the impact of a mistake or an unforeseen event?”

In a previous chapter, I’ve discussed the value of reducing Mean Time To Recovery (MTTR). The first step in that process would be identifying that there is an issue that needs to be resolved. The second is to quickly understand relevant information about the event. That’s where observability comes in. It provides an organization with an ability to inspect the state of the systems at every step and detect changes, allowing you to assess and provide feedback on suboptimal processes and changes in behavior. Monitoring should be treated as a continuous feedback loop for determining the performance of each metric at any given time.

However, because a system’s purpose is to support business objectives, focusing solely on the system’s health is not sufficient to guarantee the business’ performance. While major issues such as outages that substantially disrupt user experience have a clear impact on business performance, more subtle issues can have an even larger impact on business operations despite the appearance of technical ...

Get DevOps and Business now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.