How to do it...

  1. To get started, we'll create several blobs that can be used to simulate clusters of data:
from sklearn.datasets import make_blobsimport numpy as npblobs, classes = make_blobs(500, centers=3)from sklearn.cluster import KMeanskmean = KMeans(n_clusters=3)kmean.fit(blobs)KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,     n_clusters=3, n_init=10, n_jobs=1, precompute_distances='auto',     random_state=None, tol=0.0001, verbose=0)
  1. First, we'll look at the silhouette distance. Silhouette distance is the ratio of the difference between the in-cluster dissimilarity and the closest out-of-cluster dissimilarity, and the maximum of these two values. It can be thought of as a measure of how separate the clusters are. ...

Get scikit-learn Cookbook - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.