Chapter 14. Regression
Regression is a supervised machine learning process. It is similar to classification, but rather than predicting a label, we try to predict a continuous value. If you are trying to predict a number, then use regression.
It turns out that sklearn supports many of the
same classification models for regression problems. In fact, the
API is the same, calling .fit
, .score
, and .predict
. This is also true for the next-generation boosting libraries, XGBoost and LightGBM.
Though there are similarities with the classification models and hyperparameters, the evaluation metrics are different for regression. This chapter will review many of the types of regression models. We will use the Boston housing dataset to explore them.
Here we load the data, create a split version for training and testing, and create another split version with standardized data:
>>>
import
pandas
as
pd
>>>
from
sklearn.datasets
import
load_boston
>>>
from
sklearn
import
(
...
model_selection
,
...
preprocessing
,
...
)
>>>
b
=
load_boston
()
>>>
bos_X
=
pd
.
DataFrame
(
...
b
.
data
,
columns
=
b
.
feature_names
...
)
>>>
bos_y
=
b
.
target
>>>
bos_X_train
,
bos_X_test
,
bos_y_train
,
bos_y_test
=
model_selection
.
train_test_split
(
...
bos_X
,
...
bos_y
,
...
test_size
=
0.3
,
...
random_state
=
42
,
...
)
>>>
bos_sX
=
preprocessing
.
StandardScaler
()
.
fit_transform
(
...
bos_X
...
)
>>>
bos_sX_train
,
bos_sX_test
,
bos_sy_train
,
bos_sy_test
=
model_selection
.
train_test_split
(
...
bos_sX
,
...
bos_y
,
...
test_size ...
Get Machine Learning Pocket Reference now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.