PCA for big data – RandomizedPCA

The main issue with PCA is the complexity of the underlying singular value decomposition (SVD) algorithm that does the reduction work, making the whole process very difficult to scale. There is a faster algorithm in Scikit-learn based on randomized SVD. It is a lighter but approximate iterative decomposition method. Using randomized SVD, the full-rank reconstruction is not perfect, and the basis vectors are optimized locally during every iteration. On the other hand, it requires only a few steps to get a good approximation, demonstrating how randomized SVD is much faster than the classical SVD algorithms. Therefore, this reduction algorithm is a great choice if the training dataset is large. In the following ...

Get Python Data Science Essentials - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.