Chapter 4. Handling Numerical Data
4.0 Introduction
Quantitative data is the measurement of something—whether class size, monthly sales, or student scores. The natural way to represent these quantities is numerically (e.g., 29 students, $529,392 in sales). In this chapter, we will cover numerous strategies for transforming raw numerical data into features purpose-built for machine learning algorithms.
4.1 Rescaling a Feature
Problem
You need to rescale the values of a numerical feature to be between two values.
Solution
Use scikit-learn’s MinMaxScaler to rescale a feature array:
# Load librariesimportnumpyasnpfromsklearnimportpreprocessing# Create featurefeature=np.array([[-500.5],[-100.1],[0],[100.1],[900.9]])# Create scalerminmax_scale=preprocessing.MinMaxScaler(feature_range=(0,1))# Scale featurescaled_feature=minmax_scale.fit_transform(feature)# Show featurescaled_feature
array([[ 0. ],
[ 0.28571429],
[ 0.35714286],
[ 0.42857143],
[ 1. ]])
Discussion
Rescaling is a common preprocessing task in machine learning. Many of the algorithms described later in this book will assume all features are on the same scale, typically 0 to 1 or –1 to 1. There are a number of rescaling techniques, but one of the simplest is called min-max scaling. Min-max scaling uses the minimum and maximum values of a feature to rescale values to within a range. Specifically, min-max calculates:
where x is the feature vector, x