Chapter 16: K-Means and DBSCAN Clustering

Data clustering allows us to organize unlabeled data into groups of observations with more in common with other members of the group than with observations outside of the group. There are a surprisingly large number of applications for clustering, either as the final model of a machine learning pipeline or as input for another model. This includes market research, image processing, and document classification. We sometimes also use clustering to improve exploratory data analysis or to create more meaningful visualizations.

K-means and density-based spatial clustering of applications with noise (DBSCAN) clustering, like principal component analysis (PCA), are unsupervised learning algorithms. There are ...

Get Data Cleaning and Exploration with Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.