Chapter 12

Clustering

DOI: 10.1201/9781003158745-12

In the previous chapters, the data were considered homogeneous: all the observations were distributed according to a common model. Such an assumption is valid for data coming from small-scale controlled experiments, but it is highly unrealistic in the era of “Big Data”, where data come from multiple sources. A recipe for dealing with such inhomogeneous data is to consider them as an assemblage of several homogeneous data sets, corresponding to homogeneous “subpopulations”. Then each subpopulation can be treated either independently or jointly. The main hurdle in this approach is to recover the unknown subpopulations, which is the main goal of clustering algorithms.

Clustering algorithms ...

Get Introduction to High-Dimensional Statistics, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.