O'Reilly logo

Spark Cookbook by Rishi Yadav

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Dimensionality reduction with principal component analysis

Dimensionality reduction is the process of reducing the number of dimensions or features. A lot of real data contains a very high number of features. It is not uncommon to have thousands of features. Now, we need to drill down to features that matter.

Dimensionality reduction serves several purposes such as:

  • Data compression
  • Visualization

When the number of dimensions is reduced, it reduces the disk footprint and memory footprint. Last but not least; it helps algorithms to run much faster. It also helps reduce highly correlated dimensions to one.

Humans can only visualize three dimensions, but data can have a much higher number of dimensions. Visualization can help find hidden patterns in the ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required