Comparison between mini-batch K-means and BIRCH

In this example, we want to compare the performances of both algorithms with a bidimensional dataset containing 2,000 samples split into 8 blobs (as the purpose is analytic, we are also using the ground truth), as follows:

from sklearn.datasets import make_blobsnb_clusters = 8nb_samples = 2000X, Y = make_blobs(n_samples=nb_samples, n_features=2, centers=nb_clusters,                  cluster_std=0.25, center_box=[-1.5, 1.5], shuffle=True, random_state=100)

The dataset (which is already shuffled to remove any inter-correlation in the streaming process) is shown in the following screenshot:

Bidimensional dataset ...

Get Hands-On Unsupervised Learning with Python now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.