Let's assume that our dataset, X, has N images and each image has D flattened out pixels. The following three processing steps are usually performed on X:
- Mean subtraction: In this step, we compute a mean image across the whole dataset and subtract this mean image from each image. This step has the effect of centering the data across the origin along each of the feature dimensions. To implement this step in Python:
import numpy as np mean_X = np.mean(X, axis=0) centered_X = X - mean_X
- Normalization: The mean subtraction step is often followed by a normalization step, which has the effect of scaling each feature dimension along the same scale. This is done by dividing each feature column by its standard deviation. ...