Standardizing features
Standardization refers to the process of scaling the data to have zero mean and unit variance. This is a common requirement for a wide range of machine learning algorithms, which might behave badly if individual features do not fulfill this requirement. We could manually standardize our data by subtracting from every data point the mean value (μ) of all the data, and dividing that by the variance (σ) of the data; that is, for every feature x, we would compute (x - μ) / σ.
Alternatively, scikit-learn offers a straightforward implementation of this process in its preprocessing module. Let's consider a 3 x 3 data matrix X, standing for three data points (rows) with three arbitrarily chosen feature values each (columns): ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access