12.5 Clustering-Based Approaches

The notion of outliers is highly related to that of clusters. Clustering-based approaches detect outliers by examining the relationship between objects and clusters. Intuitively, an outlier is an object that belongs to a small and remote cluster, or does not belong to any cluster.

This leads to three general approaches to clustering-based outlier detection. Consider an object.

■ Does the object belong to any cluster? If not, then it is identified as an outlier.

■ Is there a large distance between the object and the cluster to which it is closest? If yes, it is an outlier.

■ Is the object part of a small or sparse cluster? If yes, then all the objects in that cluster are outliers.

Let’s look at examples of each ...

Get Data Mining: Concepts and Techniques, 3rd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.