Principal component analysis

The principal component analysis (PCA) is a subset of dimensionality reduction. Dimensionality reduction is the process of reducing the number of features that provide no predictive value to a predictive model. We also optimize and improve the computational efficiency of processing the algorithms. This is because a dataset with a smaller number of features will make it easier for the algorithm to detect patterns more quickly.

The first step in PCA is called decorrelation. Features that are highly correlated with each other provide no value to the predictive model. Therefore, in the decorrelation step, the PCA takes two highly correlated features and spreads their data points such that it's aligned across the axis, ...

Get Machine Learning with scikit-learn Quick Start Guide now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.