Learning by doing – partition clustering with kmeans()
Perhaps the most widely used clustering family of algorithms is k-means. In this section, we will examine how it works and ways to assess the quality of a clustering solution.
K-means is a partitioning algorithm that produces
k (user-defined number) clusters of cases that are more similar to each other than to cases outside the cluster. K-means starts by randomly initiating the centroid (the value of the considered dimensions) of each cluster. From now, the process, aiming at creating homogenous clusters, is iterative until a final solution is found. For each case, the distance from the centroid of each cluster is computed, and cases are assigned to the closest cluster. After this step, k-means ...