K-means

The first method that we'll introduce is named K-means, the most commonly used clustering algorithm despite its inevitable shortcomings. In signal processing, K-means is the equivalent of a vectorial quantization, that is, the selection of the best codeword (from a given codebook) that better approximates the input observation (or a word).

You must provide the algorithm with the K parameter, which is the number of clusters. Sometimes, this might be a limitation because you have to investigate first which is the right K for the current dataset.

K-means iterates an EM (expectation/maximization) approach. During the first phase, it assigns each training point to the closest cluster centroid; during the second phase, it moves the cluster ...

Get Python Data Science Essentials - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.