Chapter 5. Clustering
In Chapter 3, we introduced the most important dimensionality reduction algorithms in unsupervised learning and highlighted their ability to densely capture information. In Chapter 4, we used the dimensionality reduction algorithms to build an anomaly detection system. Specifically, we applied these algorithms to detect credit card fraud without using any labels. These algorithms learned the underlying structure in the credit card transactions. Then, we separated the normal transactions from the rare, potentially fraudulent ones based on the reconstruction error.
In this chapter, we will build on these unsupervised learning concepts by introducing clustering, which attempts to group objects together based on similarity. Clustering achieves this without using any labels, comparing how similar the data for one observation is to data for other observations and groups.
Clustering has many applications. For example, in credit card fraud detection, clustering can group fraudulent transactions together, separating them from normal transactions. Or, if we had only a few labels for the observations in our dataset, we could use clustering to group the observations first (without using labels). Then, we could transfer the labels of the few labeled observations to the rest of the observations within the same group. This is a form of transfer learning, a rapidly growing field in machine learning.
In areas such as online and retail shopping, marketing, social media, recommender ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access