PCA with the MNIST dataset

Now, let's apply the PCA, in order to reduce the dimensionality of the MNIST dataset. We are going to use the compressed version (1,797, 8 × 8 images) provided by scikit-learn, but none of our considerations will be affected by this choice. Let's start by loading and normalizing the dataset:

from sklearn.datasets import load_digitsdigits = load_digits()X = digits['data'] / np.max(digits['data'])

From the theoretical discussion, we know that the magnitude of the eigenvalues of the covariance matrix is proportional to the relative importance (that is, the explained variance, and therefore the informative content) of the corresponding principal component. Therefore, if they are sorted in descending order, it's possible ...

Get Hands-On Unsupervised Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.