Chapter 16. Explaining Regression Models

Most of the techniques used to explain classification models apply to regression models. In this chapter, I will show how to use the SHAP library to interpret regression models.

We will interpret an XGBoost model for the Boston housing dataset:

>>> import xgboost as xgb
>>> xgr = xgb.XGBRegressor(
...     random_state=42, base_score=0.5
... )
>>> xgr.fit(bos_X_train, bos_y_train)

Shapley

I’m a big fan of Shapley because it is model agnostic. This library also gives us global insight into our model and helps explain individual predictions. If you have a black-box model, I find it very useful.

We will first look at the prediction for index 5. Our model predicts the value to be 27.26:

>>> sample_idx = 5
>>> xgr.predict(bos_X.iloc[[sample_idx]])
array([27.269186], dtype=float32)

To use the model, we have to create a TreeExplainer from our model and estimate the SHAP values for our samples. If we want to use Jupyter and have an interactive interface, we also need to call the initjs function:

>>> import shap
>>> shap.initjs()

>>> exp = shap.TreeExplainer(xgr)
>>> vals = exp.shap_values(bos_X)

With the explainer and the SHAP values, we can create a force plot to explain the prediction (see Figure 16-1). This informs us that the base prediction is 23, and that the population status (LSTAT) and property tax rate (TAX) push the price up, while the number of rooms (RM) pushes the price down:

>>> shap.force_plot(
...     exp.expected_value,
...     vals[

Get Machine Learning Pocket Reference now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.