The first method that we'll introduce is named K-means, the most commonly used clustering algorithm despite its inevitable shortcomings. In signal processing, K-means is the equivalent of a vectorial quantization, that is, the selection of the best codeword (from a given codebook) that better approximates the input observation (or a word).

You must provide the algorithm with the K parameter, which is the number of clusters. Sometimes, this might be a limitation because you have to investigate first which is the right K for the current dataset.

K-means iterates an EM (expectation/maximization) approach. During the first phase, it assigns each training point to the closest cluster centroid; during the second phase, it moves the cluster ...

Get Python Data Science Essentials - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.