Chapter 5. Anomalies in Sporadic Events

The input signals in the examples discussed in previous chapters have all been values sampled at uniform intervals. Such signals make it easy to talk about a reconstructed value computed by a model and the difference between that value and the original input the reconstruction error.

In practice, however, there are other forms of data that are important to process for anomaly detection. One important class of such data is known as an event stream and is usually derived from log files of one sort or another. A key characteristic of these log files is that they record events that occur at irregular intervals.

It is also fairly common for these events to be associated with a symbolic value such as your IP address and the URL of a web page you visit if page views are the input of interest. Another input might be stock trades, for which the symbolic values could include the stock sign and be combined with the trades, price, and number of shares. Other examples of this type of input are e-commerce purchases or Internet packets. In each of these cases, we want to be able to detect anomalous activity in these event streams, such as changes in the rate or geolocation of web traffic, or perhaps the number of stock trades in particular time periods in stock markets. Sometimes the anomaly of interest is the absence of activity during a particular time interval, and that can be a challenge for anomaly detection models to handle.

Because these events occur ...

Get Practical Machine Learning: A New Look at Anomaly Detection now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.