Chapter 1. Looking Toward the Future
Everyone loves a mystery, and at the heart of it, that’s what anomaly detection is—spotting the unusual, catching the fraud, discovering the strange activity. Anomaly detection has a wide range of useful applications, from banking security to natural sciences to medicine to marketing. Anomaly detection carried out by a machine-learning program is actually a form of artificial intelligence. With the ever-increasing volume of data and the new types of data, such as sensor data from an increasingly large variety of objects that needs to be considered, it’s no surprise that there also is a growing interest in being able to handle more decisions automatically via machine-learning applications. But in the case of anomaly detection, at least some of the appeal is the excitement of the chase itself.
When are anomaly-detection methods a good choice? Unlike fictional detective stories, in anomaly detection, you may not have a clear suspect to search for, and you may not even know what the “crime” is. In fact, one way to think about when to turn to anomaly detection is this: Anomaly detection is about finding what you don’t know to look for.
You are searching for anomalies, but you don’t know what their characteristics will be. If you did, you could use a different form of machine learning, called classification, or you would just write specific rules to find the anomalies. But that’s not generally where you start.
Classification is a form of supervised learning where you have examples of each kind of thing you are looking for. You apply a learning algorithm to these examples to build a model that can use features of new data to classify them into categories that represent each kind of data of interest. When you have examples of normal and some number of abnormal situations, classifers can help you mark new situations as normal or abnormal. Even when you know about some kinds of anomalies, it is always good to keep an eye out for new kinds that you don’t know about. That is where anomaly detection is applied.
So you use the unsupervised-learning approach of anomaly detection when you don’t know exactly what you are looking for. Anomaly detection is a discovery process to help you figure out what is going on and what you need to look for. The anomaly-detection program must discover interesting patterns or connections in the data itself, and the detector does this by first identifying the most important aspect of anomaly detection: finding what is normal. Once your model does that, your machine-learning program can then spot outliers, in other words, data that falls outside of what is normal.
Anomalies are defined not by their own characteristics, but in contrast to what is normal. You may not know what the anomalies will look like, but you can build a system to detect them in contrast to what you’ve discovered and defined as being a normal pattern. Note that normal in this context includes all of the anomalies that you already know about and have accounted for using a classifier. The outliers are only those events that don’t match what you already know. Consider this way to think about the problem: anomaly in this context just means different than expected—it does not refer to desirable or undesirable. You may know of certain types of events that are somewhat unusual and require attention, perhaps certain failures in a system. If these occur sufficiently often to be well characterized, you can use a classifier to catalog them as problems of a particular type. That’s a somewhat different goal than true anomaly detection where you are looking for events that are rare relative to what is expected and that often are surprising, or at least undefined ahead of time.
Together, anomaly detection and classification make for a useful pair when it comes to finding a solution to real-world problems. Anomaly detection is used first—in a discovery phase—to help you figure out what is going on and what you need to look for. You could use the anomaly-detection model to spot outliers, then set up an efficient classification model to assign new examples to the categories you’ve already identified. You then update the anomaly detector to consider these new examples as normal and repeat the process. This idea is shown in Figure 1-2 as one way to use anomaly detection.
Anomaly detection, like classification, is not new, but recently there has been an increased interest in using it. Fortunately, there also are new approaches to carrying it out effectively in practical settings; much more accurate and sophisticated methods are now available. Some of the biggest changes have to do with being able to handle anomaly detection at huge scale, in real time. We will describe some approaches that can help, especially when using a realtime distributed file system. We will focus particularly on approaches that have demonstrated, practical, and simple implementations.
The move from specialized academic research to methods that are useful for practical machine learning is happening in response to more than just an increase in the volume of available data—there is also a great increase in new types of data. For example, many new forms of sensors are being deployed. Smart meters monitor energy usage in businesses and residential settings, reporting back every few minutes. This information can be used individually or looked at as a group from a particular geographical location.
Industrial equipment such as drilling rigs and manufacturing tools use sensors to report on a wide range of parameters. The advances in medical device sensors are astounding. Radio-frequency identification (RFID) tags are also commonplace on merchandise in retail stores, in warehouses, or even on your cat. Data provided by these sensors and other sources range from simple identification signals to complex measurements of temperature, pressure, vibrations, and more.
How can reporting from all these interconnected objects be used? Collectively, these objects begin to make up the Internet of Things (IoT). Relationships between objects and people, between objects and other objects, conditions in the present, and histories of their condition over time can be monitored and stored for future analysis, but doing so is quite a challenge. However, the rewards are also potentially enormous. That’s where machine learning and anomaly detection can provide a huge benefit.
Analysts predict that the number of interconnected devices in the Internet of Things will reach the tens of billions less than a decade from this writing. Machine-learning techniques will be critical to our understanding of what the signals from devices are telling us.
As we collect and analyze more data from sensors, we achieve a more granular view of how our systems are functioning, which in turn gives us the opportunity for a greater awareness of when things change for better or for worse. Not only is there a growing need for more accurate anomaly detection, there is also a growing desire for new and more efficient ways to “cut to the chase” in order to be able to put anomaly detection to work in practical, real-world settings. Practical anomaly detection is more than just selecting the right algorithm and having the technical expertise to build the system—it also means finding solutions that take into account realistic limitations on resources, scheduling demands including time-to-value to make the projects cost effective, and correct understanding of business goals.
In this publication, we show you the underlying ideas of why anomaly detection works and what it’s good for. We explore the idea of finding what is normal, deciding how to measure things that are far from normal and how far that must be to be considered an outlier (Chapters 2 and 3). We provide a new method to do this (t-digest) and look at how it can be applied in very simple systems (Chapter 3) and also in more complex systems (Chapters 4 and 5).
Throughout this report, we strongly recommend the use of adaptive, probabilistic models for predicting what is normal and how to contrast that to what is observed. One of our topics in Chapter 4 dabbles in deep learning with a time-series example, or at least dips its toe into the shallow end of that pool. Although this is an advanced concept, the execution of it in our example is surprisingly simple—no advanced math required.
Chapter 5 provides some very practical ways to model a system with sporadic events, such as website traffic or e-commerce purchases. In Chapter 6, we provide a practical illustration of many of the basic concepts in the form of detecting a phishing attack on a secure website. Let’s see how all this works.