February 2020
Intermediate to advanced
432 pages
10h 50m
English
Clustering is a scenario where we have many items and we want to group them by similarity. In this case, items are unlabeled and we ask the algorithm to do two things:
As an example, think of a collection of texts about many topics and you wish the algorithm to group similar texts and identify the main topic of each group, that is, label them: history, science, literature, philosophy, and so on. One of the classical algorithms for this scenario is the nearest neighbor method, where you define a metric, calculate it for each pair of items, and group together those pairs that are close enough (based on the defined ...