July 2017
Intermediate to advanced
382 pages
9h 13m
English
The k-means algorithm is based on a simple assumption, which is that points will be closer to their own cluster center than to others. Consequently, k-means always assumes linear boundaries between clusters, meaning that it will fail whenever the geometry of the clusters is more complicated than that.
We see this limitation for ourselves by generating a slightly more complicated dataset. Instead of generating data points from Gaussian blobs, we want to organize the data into two overlapping half circles. We can do this using scikit-learn's make_moons. Here, we choose 200 data points belonging to two half circles, in combination with some Gaussian noise:
In [14]: from sklearn.datasets import make_moons ...
Read now
Unlock full access