How does k-means clustering work?

The goal of the k-means algorithm is to partition the data into k groups based on feature similarities. K is a predefined property of a k-means clustering model. Each of the k clusters are specified by a centroid (center of a cluster) and each data sample belongs to the cluster with the nearest centroid. During training, the algorithm iteratively updates the k centroids based on the data provided. Specifically, it involves the following steps:

  1. Specifying k: The algorithm needs to know how many clusters to generate as an end result.
  2. Initializing centroids: The algorithm starts with randomly selecting k samples from the dataset as centroids.
  3. Assigning clusters: Now that we have k centroids, samples that share ...

Get Python Machine Learning By Example - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.