Fine-tuning the clustering
Deciding the optimum value of K is one of the tough parts while performing a k-means clustering. There are a few methods that can be used to do this.
The elbow method
We earlier discussed that a good cluster is defined by the compactness between the observations of that cluster. The compactness is quantified by something called intra-cluster distance. The intra-cluster distance for a cluster is essentially the sum of pair-wise distances between all possible pairs of points in that cluster.
If we denote intra-cluster distance by W, then for a cluster k intra-cluster, the distance can be denoted by:
Generally, the normalized ...
Get Python: Advanced Predictive Analytics now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.