12.5 Clustering-Based Approaches
The notion of outliers is highly related to that of clusters. Clustering-based approaches detect outliers by examining the relationship between objects and clusters. Intuitively, an outlier is an object that belongs to a small and remote cluster, or does not belong to any cluster.
This leads to three general approaches to clustering-based outlier detection. Consider an object.
■ Does the object belong to any cluster? If not, then it is identified as an outlier.
■ Is there a large distance between the object and the cluster to which it is closest? If yes, it is an outlier.
■ Is the object part of a small or sparse cluster? If yes, then all the objects in that cluster are outliers.
Let’s look at examples of each ...
Get Data Mining: Concepts and Techniques, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.