Chapter 5. Practical Anomaly Detection for Monitoring

Recall that one of our goals for this book is to help you actually get anomaly detection running in production and solving monitoring problems you have with your current systems.

Typical goals for adding anomaly detection probably include:

To avoid setting or changing thresholds per server, because machines differ from each other
To avoid modifying thresholds when servers, features, and workloads change over time
To avoid static thresholds that throw false alerts at some times of the day or week, and miss problems at other times

In general you can probably describe these goals as âjust make Nagios a little better for some checks.â

Another goal might be to find all metrics that are abnormal without generating alerts, for use in diagnosing problems. We consider this to be a pretty hard problem because it is very general. You probably understand why at this point in the book. We wonât focus on this goal in this chapter, although you can easily apply the discussion in this chapter to that approach on a case by case basis.

The best place to begin is often where you experience the most painful monitoring problem right now. Take a look at your alert history or outages. Whatâs the source of the most noise or the place where problems happen the most without an alert to notify you?

Is Anomaly Detection the Right Approach?

Not all of the alerting problems youâll find are solvable with anomaly detection. Some come from ...

Get Anomaly Detection for Monitoring now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Anomaly Detection for Monitoring by Preetam Jinka, Baron Schwartz

Chapter 5. Practical Anomaly Detection for Monitoring

Is Anomaly Detection the Right Approach?

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly