February 2019
Intermediate to advanced
386 pages
9h 54m
English
As we explained in Chapter 1, Getting Started with Unsupervised Learning, the main goal of a cluster analysis is to group the elements of a dataset according to a similarity measure or a proximity criterion. In the first part of this chapter, we are going to focus on the former approach, while in the second part and in the next chapter, we will analyze more generic methods that exploit other geometric features of the dataset.
Let's take a data generating process pdata(x) and draw N samples from it:

It's possible to assume that the probability space of pdata(x) is partitionable into (potentially infinite) configurations ...