In this example, we want to compare the performances of both algorithms with a bidimensional dataset containing 2,000 samples split into 8 blobs (as the purpose is analytic, we are also using the ground truth), as follows:
from sklearn.datasets import make_blobsnb_clusters = 8nb_samples = 2000X, Y = make_blobs(n_samples=nb_samples, n_features=2, centers=nb_clusters, cluster_std=0.25, center_box=[-1.5, 1.5], shuffle=True, random_state=100)
The dataset (which is already shuffled to remove any inter-correlation in the streaming process) is shown in the following screenshot: