In the previous chapter, we have shown that K-means is generally a good choice when the geometry of the clusters is convex. However, this algorithm has two main limitations: the metric is always Euclidean, and it's not very robust to outliers. The first element is obvious, while the second one is a direct consequence of the nature of the centroids. In fact, K-means chooses centroids as actual means that cannot be part of the dataset. Hence, when a cluster has some outliers, the mean is influenced and moved proportionally toward them. The following diagram shows an example where the presence of a few outliers forces the centroid to reach a position outside the dense region:

Example of centroid selection (left) and medoid selection ...

Get Hands-On Unsupervised Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.