O'Reilly logo

The Data Science Handbook by Field Cady

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 10Unsupervised Learning: Clustering and Dimensionality Reduction

This chapter is about techniques for studying the latent structure of your data, in situations where we don't know a priori what it should look like. They are often called “unsupervised” learning because, unlike classification and regression, the “right answers” are not known going in. There are two primary ways of studying a dataset's structure: clustering and dimensionality reduction.

Clustering is an attempt to group the data points into distinct “clusters.” Typically, this is done in the hopes that the different clusters correspond to different underlying phenomena. For example, if you plotted people's height on the x-axis and their weight on the y-axis, you would see two more-or-less clear blobs, corresponding to men and women. An alien who knew nothing else about human biology might hypothesize that we come in two distinct types.

In dimensionality reduction, the goal isn't to look for distinct categories in the data. Instead, the idea is that the different fields are largely redundant, and we want to extract the real, underlying variability in the data. The idea is that your data is d-dimensional, but all of the points actually only lie on a k-dimensional subset of the space (with k < d), plus some d-dimensional noise. For example, in 3d data, your points could line mostly just along a single line or perhaps in a curved circle. Real situations of course are usually not so clean cut. It's more useful ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required