Chapter 7

Clustering

Abstract

Clustering is an unsupervised data science technique where the records in a dataset are organized into different logical groupings. The data are grouped in such a way that records inside the same group are more similar than records outside the group. Clustering has a wide variety of applications ranging from market segmentation to customer segmentation, electoral grouping, web analytics, and outlier detection. Clustering is also used as a data compression technique and data preprocessing technique for supervised tasks. Many different data science approaches are available to cluster the data and are developed based on proximity between the records, density in the dataset, or novel application of neural networks. k

Get Data Science, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.