Chapter 16. Clustering by finding centers with k-means

This chapter covers

  • Understanding the need for clustering
  • Understanding over- and underfitting for clustering
  • Validating the performance of a clustering algorithm

Our first stop in clustering brings us to a very commonly used technique: k-means clustering. I’ve used the word technique here rather than algorithm because k-means describes a particular approach to clustering that multiple algorithms follow. I’ll talk about these individual algorithms later in the chapter.

Note

Don’t confuse k-means with k-nearest neighbors! K-means is for unsupervised learning, whereas k-nearest neighbors is a supervised algorithm for classification.

K-means clustering attempts to learn ...

Get Machine Learning with R, the tidyverse, and mlr now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.