Chapter 3. Dimensionality Reduction

In this chapter, we will focus on one of the major challenges in building successful applied machine learning solutions: the curse of dimensionality. Unsupervised learning has a great counter—dimensionality reduction. In this chapter, we will introduce this concept and build from there so that you can develop an intuition for how it all works.

In Chapter 4, we will build our own unsupervised learning solution based on dimensionality reduction—specifically, an unsupervised learning-based credit card fraud detection system (as opposed to the supervised-based system we built in Chapter 2). This type of unsupervised fraud detection is known as anomaly detection, a rapidly growing area in the field of applied unsupervised learning.

But before we build an anomaly detection system, let’s cover dimensionality reduction in this chapter.

The Motivation for Dimensionality Reduction

As mentioned in Chapter 1, dimensionality reduction helps counteract one of the most commonly occurring problems in machine learning—the curse of dimensionality—in which algorithms cannot effectively and efficiently train on the data because of the sheer size of the feature space.

Dimensionality reduction algorithms project high-dimensional data to a low-dimensional space, retaining as much of the salient information as possible while removing redundant information. Once the data is in the low-dimensional space, machine learning algorithms are able to identify interesting patterns ...

Get Hands-On Unsupervised Learning Using Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.