How it works

In the k-means algorithm, each feature is used without any regard to its weight. In essence, all features are assumed to be on the same scale. We saw the problems with not scaling features in Chapter 2, Classification with scikit-learn Estimators. The result of this is that k-means is looking for circular clusters, visualized here:

Oval shaped clusters can also be discovered by k-means. The separation usually isn't quite so smooth, but can be made easier with feature scaling. An example of this shaped cluster is as follows:

As ...

Get Learning Data Mining with Python - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.