The k-means clustering algorithm

In this section, we will cover the k-means clustering algorithm in depth. The k-means is a partitional clustering algorithm.

Let the set of data points (or instances) be as follows:

D = {x1, x2, …, xn}, where

xi = (xi1, xi2, …, xir), is a vector in a real-valued space X ⊆ Rr, and r is the number of attributes in the data.

The k-means algorithm partitions the given data into k clusters with each cluster having a center called a centroid.

k is specified by the user.

Given k, the k-means algorithm works as follows:

Algorithm k-means (k, D)

  1. Identify the k data points as the initial centroids (cluster centers).
  2. Repeat step 1.
  3. For each data point x ϵ D do.
  4. Compute the distance from x to the centroid.
  5. Assign x to the closest centroid ...

Get Practical Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.