Chapter 5. Clustering

In Chapter 3, we introduced the most important dimensionality reduction algorithms in unsupervised learning and highlighted their ability to densely capture information. In Chapter 4, we used the dimensionality reduction algorithms to build an anomaly detection system. Specifically, we applied these algorithms to detect credit card fraud without using any labels. These algorithms learned the underlying structure in the credit card transactions. Then, we separated the normal transactions from the rare, potentially fraudulent ones based on the reconstruction error.

In this chapter, we will build on these unsupervised learning concepts by introducing clustering, which attempts to group objects together based on similarity. Clustering achieves this without using any labels, comparing how similar the data for one observation is to data for other observations and groups.

Clustering has many applications. For example, in credit card fraud detection, clustering can group fraudulent transactions together, separating them from normal transactions. Or, if we had only a few labels for the observations in our dataset, we could use clustering to group the observations first (without using labels). Then, we could transfer the labels of the few labeled observations to the rest of the observations within the same group. This is a form of transfer learning, a rapidly growing field in machine learning.

In areas such as online and retail shopping, marketing, social media, recommender ...

Get Hands-On Unsupervised Learning Using Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.