Principal component analysis

The principal component analysis (PCA) is a subset of dimensionality reduction. Dimensionality reduction is the process of reducing the number of features that provide no predictive value to a predictive model. We also optimize and improve the computational efficiency of processing the algorithms. This is because a dataset with a smaller number of features will make it easier for the algorithm to detect patterns more quickly.

The first step in PCA is called decorrelation. Features that are highly correlated with each other provide no value to the predictive model. Therefore, in the decorrelation step, the PCA takes two highly correlated features and spreads their data points such that it's aligned across the axis, ...

Get Machine Learning with scikit-learn Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.