Chapter 16. Explaining Regression Models
Most of the techniques used to explain classification models apply to regression models. In this chapter, I will show how to use the SHAP library to interpret regression models.
We will interpret an XGBoost model for the Boston housing dataset:
>>>
import
xgboost
as
xgb
>>>
xgr
=
xgb
.
XGBRegressor
(
...
random_state
=
42
,
base_score
=
0.5
...
)
>>>
xgr
.
fit
(
bos_X_train
,
bos_y_train
)
Shapley
I’m a big fan of Shapley because it is model agnostic. This library also gives us global insight into our model and helps explain individual predictions. If you have a black-box model, I find it very useful.
We will first look at the prediction for index 5. Our model predicts the value to be 27.26:
>>>
sample_idx
=
5
>>>
xgr
.
predict
(
bos_X
.
iloc
[[
sample_idx
]])
array([27.269186], dtype=float32)
To use the model, we have to create a TreeExplainer
from our model
and estimate the SHAP values for our samples. If we want to use
Jupyter and have an interactive interface, we also need to call the
initjs
function:
>>>
import
shap
>>>
shap
.
initjs
()
>>>
exp
=
shap
.
TreeExplainer
(
xgr
)
>>>
vals
=
exp
.
shap_values
(
bos_X
)
With the explainer and the SHAP values, we can create a force plot to explain the prediction (see Figure 16-1). This informs us that the base prediction is 23, and that the population status (LSTAT) and property tax rate (TAX) push the price up, while the number of rooms (RM) pushes the price down:
>>>
shap
.
force_plot
(
...
exp
.
expected_value
,
...
vals
[
Get Machine Learning Pocket Reference now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.