We can now make an example of FA with Scikit-Learn using the MNIST handwritten digits dataset (70,000 28 × 28 grayscale images) in the original version and with added heteroscedastic noise (ωi randomly selected from [0, 0.75]).
The first step is to load and zero-center the original dataset (I'm using the functions defined in the first chapter, Chapter 1, Machine Learning Model Fundamentals):
import numpy as npfrom sklearn.datasets import fetch_mldatadigits = fetch_mldata('MNIST original')X = zero_center(digits['data'].astype(np.float64))np.random.shuffle(X)Omega = np.random.uniform(0.0, 0.75, size=X.shape[1])Xh = X + np.random.normal(0.0, Omega, size=X.shape)
After this step, the X variable ...