The k-means clustering algorithm

In this section, we will cover the k-means clustering algorithm in depth. The k-means is a partitional clustering algorithm.

Let the set of data points (or instances) be as follows:

D = {x₁, x₂, …, x_n}, where

xi = (xi₁, xi₂, …, xi_r), is a vector in a real-valued space X ⊆ R_r, and r is the number of attributes in the data.

The k-means algorithm partitions the given data into k clusters with each cluster having a center called a centroid.

k is specified by the user.

Given k, the k-means algorithm works as follows:

Algorithm k-means (k, D)

Get Practical Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Practical Machine Learning by Sunila Gollapudi