Assessing cluster correctness
We talked a little bit about assessing clusters when the ground truth is not known. However, we have not yet talked about assessing KMeans when the cluster is known. In a lot of cases, this isn't knowable; however, if there is outside annotation, we will know the ground truth, or at least the proxy, sometimes.
Getting ready
So, let's assume a world where we have some outside agent supplying us with the ground truth.
We'll create a simple dataset, evaluate the measures of correctness against the ground truth in several ways, and then discuss them:
>>> from sklearn import datasets >>> from sklearn import cluster >>> blobs, ground_truth = datasets.make_blobs(1000, centers=3, cluster_std=1.75)
How to do it...
Before we ...
Get scikit-learn : Machine Learning Simplified now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.