July 2019
Beginner to intermediate
740 pages
16h 52m
English
We've seen that our dataframes had columns with very different scales; if we want to use any model that calculates a distance metric (such as k-means, which we will discuss in this chapter, or k-nearest neighbors (k-NN), which we will discuss briefly in Chapter 10, Making Better Predictions – Optimizing Models), we will need to scale these. As we discussed back in Chapter 1, Introduction to Data Analysis, we have quite a few options for doing so. Scikit-learn provides options in the preprocessing module for standardizing (scaling by calculating Z-scores) and min-max scaling (to normalize data to be in [0, 1]), among others.
Read now
Unlock full access