In the previous chapter, we have shown that K-means is generally a good choice when the geometry of the clusters is convex. However, this algorithm has two main limitations: the metric is always Euclidean, and it's not very robust to outliers. The first element is obvious, while the second one is a direct consequence of the nature of the centroids. In fact, K-means chooses centroids as actual means that cannot be part of the dataset. Hence, when a cluster has some outliers, the mean is influenced and moved proportionally toward them. The following diagram shows an example where the presence of a few outliers forces the centroid to reach a position outside the dense region:

Example of centroid selection (left) and medoid selection ...

Get Hands-On Unsupervised Learning with Python now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.