Silhouette score

The most common method to assess the performance of a clustering algorithm without knowledge of the ground truth is the silhouette score. It provides both a per-sample index and a global graphical representation that shows the level of internal coherence and separation of the clusters. In order to compute the score, we need to introduce two auxiliary measures. The first one is the average intra-cluster distance of a sample xi ∈ Kj assuming the cardinality of |Kj| = n(j):

For K-means, the distance is assumed to be Euclidean, but there are no specific limitations. Of course, d(•) must be the same distance function employed in ...

Get Hands-On Unsupervised Learning with Python now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.