Chapter 3. Using t-Digest for Threshold Automation

The most common form of anomaly detector in use today is a manually-set threshold alarm to send an alert for possible anomalies. Input to such an alarm is a numerical measurement of some kind. The basic idea in this case is that whenever this measurement exceeds a threshold that you have set, possibly for a certain amount of time, an alarm is sounded.

This simple approach can work fairly well if the system being observed has a simple pattern of well-understood measurements, and the number of different kinds measurements is not enormous. But this approach can become quite difficult to carry out effectively if you have a large number of measurements with behaviors that you do not understand very well. As it turns out, that situation—a large number of measurements in a system that is either unpredictable or otherwise not well defined—is commonly encountered in real-world settings of interest. That’s one reason we need some new ways to approach anomaly detection.

A good first step in improving these systems is to change the way that the threshold is set. Let’s think about the goal for a threshold and how it can be optimized. Any particular value for a threshold will detect some fraction of the anomalies that you are trying to find, and if you have chosen the threshold well, that fraction of anomalies hopefully will be large. At the same time, this threshold most likely will sometimes trigger false alarms, in cases in which normal noise ...

Get Practical Machine Learning: A New Look at Anomaly Detection now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.