Internal clustering evaluation

If we do not have a gold standard set of labels for our clusters for comparison, we are stuck with evaluating how well our clustering technique performs using internal criteria. In other words, we can still evaluate our clustering by making similarity and dissimilarity measurements within the clusters themselves.

The first of these internal metrics that we will present here is called the silhouette coefficient. The silhouette coefficient can be calculated for each clustered data point as follows:

Here, a is the mean distance between a data point and all other points in the same cluster (the Euclidean distance, ...

Get Machine Learning With Go now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.