First, let's generate a 2D dataset containing four distinct blobs. To emphasize that this is an unsupervised approach, we will leave the labels out of the visualization:
- We will continue using matplotlib for all of our visualization purposes:
In [1]: import matplotlib.pyplot as plt... %matplotlib inline... plt.style.use('ggplot')
- Following the same recipe from earlier chapters, we will create a total of 300 blobs (n_samples=300) belonging to four distinct clusters (centers=4):
In [2]: from sklearn.datasets.samples_generator import make_blobs... X, y_true = make_blobs(n_samples=300, centers=4,... cluster_std=1.0, random_state=10)... plt.scatter(X[:, 0], X[:, 1], s=100);
This will generate the following ...