Assessing cluster correctness

We talked a little bit about assessing clusters when the ground truth is not known. However, we have not yet talked about assessing KMeans when the cluster is known. In a lot of cases, this isn't knowable; however, if there is outside annotation, we will know the ground truth, or at least the proxy, sometimes.

Getting ready

So, let's assume a world where we have some outside agent supplying us with the ground truth.

We'll create a simple dataset, evaluate the measures of correctness against the ground truth in several ways, and then discuss them:

>>> from sklearn import datasets
>>> from sklearn import cluster
>>> blobs, ground_truth = datasets.make_blobs(1000, centers=3, 
                                              cluster_std=1.75)

How to do it...

Before we ...

Get scikit-learn : Machine Learning Simplified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.