January 2020
Intermediate to advanced
640 pages
16h 56m
English
Dealing with a large volume of alerts that fire concurrently certainly seems like a daunting task from an SRE's point of view. To cut through the noise, Alertmanager allows operators to specify a set of rules for grouping together alerts based on the content of the labels that have been assigned to each alert event by Prometheus.
To understand how alert grouping works, let's picture a scenario where 100 microservices are all trying to connect to a Kafka queue that is currently unavailable. Each of the services fires a high-priority alert, which, in turn, causes a new page notification to be sent to the SRE that is currently on-call. As a result, the SRE will get swamped with hundreds of page notifications about exactly ...