Learning by doing – partition clustering with kmeans()

Perhaps the most widely used clustering family of algorithms is k-means. In this section, we will examine how it works and ways to assess the quality of a clustering solution.

K-means is a partitioning algorithm that produces k (user-defined number) clusters of cases that are more similar to each other than to cases outside the cluster. K-means starts by randomly initiating the centroid (the value of the considered dimensions) of each cluster. From now, the process, aiming at creating homogenous clusters, is iterative until a final solution is found. For each case, the distance from the centroid of each cluster is computed, and cases are assigned to the closest cluster. After this step, k-means ...

Get R: Predictive Analysis now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.