9

Managing Incidents Using Alerts

This chapter will explore the concepts of incident management. We will discuss how to build a world-class incident management process, which treats those responding to incidents humanely and avoids burnout. The chapter will establish the responsibilities for this, from the senior leadership teams to the engineers responding to the callout. It will introduce the important concepts of building an organization that can handle incidents and excel at providing customers with a stable experience. With the process established, we’ll explain how to consider a service and pick critical measures that can be used to see the current service level, without being drowned out by noise.

This chapter will also explore the three ...

Get Observability with Grafana now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.