January 2020
Beginner to intermediate
372 pages
10h
English
In this recipe, we scaled the numerical variables of the Boston House Prices dataset from scikit-learn to the vector unit norm by utilizing the Manhattan or Euclidean distance. First, we loaded the dataset and divided it into train and test sets using the train_test_split() function from scikit-learn. To scale the features, we created an instance of the Normalizer() from scikit-learn and set the norm to l1 for the Manhattan distance. For the Euclidean distance, we set the norm to l2. Then, we applied the fit() method, although there were no parameters to be learned. Finally, with the transform() method, scaler divided each observation by its norm. This returned a NumPy array with the scaled dataset.
Read now
Unlock full access