Chapter 15. Metrics and Regression Evaluation
This chapter will evaluate the results of a random forest regressor trained on the Boston housing data:
>>>
rfr
=
RandomForestRegressor
(
...
random_state
=
42
,
n_estimators
=
100
...
)
>>>
rfr
.
fit
(
bos_X_train
,
bos_y_train
)
Metrics
The sklearn.metrics
module includes metrics to evaluate regression models. Metric functions ending in loss
or error
should be minimized. Functions ending in score
should be maximized.
The coefficient of determination (r²) is a common regression metric. This value is typically between 0 and 1. It represents the percent of the variance of the target that the features contribute. Higher values are better, but in general it is difficult to evaluate the model from this metric alone. Does a .7 mean it is a good score? It depends. For a given dataset, .5 might be a good score, while for another dataset, a .9 may be a bad score. Typically we use this number in combination with other metrics or visualizations to evaluate a model.
For example, it is easy to make a model that predicts stock prices for the next day with an r² of .99. But I wouldn’t trade my own money with that model. It might be slightly low or high, which can wreak havoc on trades.
The r² metric is the default metric used during grid search. You can
specify other metrics using the scoring
parameter.
The .score
method calculates this for regression models:
>>>
from
sklearn
import
metrics
>>>
rfr
.
score
(
bos_X_test
,
bos_y_test
)
0.8721182042634867
>>>
metrics ...
Get Machine Learning Pocket Reference now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.