Chapter 20. Ensembles of Models

A model ensemble, where the predictions of multiple single learners are aggregated to make one prediction, can produce a high-performance final model. The most popular methods for creating ensemble models are bagging (Breiman 1996a), random forest (Ho 1995; Breiman 2001a), and boosting (Freund and Schapire 1997). Each of these methods combines the predictions from multiple versions of the same type of model (e.g., classification trees). However, one of the earliest methods for creating ensembles is model stacking (Wolpert 1992; Breiman 1996b).

Note

Model stacking combines the predictions for multiple models of any type. For example, a logistic regression, a classification tree, and a support vector machine can be included in a stacking ensemble.

This chapter shows how to stack predictive models using the stacks package. We’ll reuse the results from Chapter 15 where multiple models were evaluated to predict the compressive strength of concrete mixtures.

The process of building a stacked ensemble is:

  1. Assemble the training set of holdout predictions (produced via resampling).

  2. Create a model to blend these predictions.

  3. For each member of the ensemble, fit the model on the original training set.

In subsequent sections, we’ll describe this process. However, before proceeding, we’ll clarify some nomenclature for the variations of what “the model” can mean. This can quickly become an overloaded term when we are working on a complex ...

Get Tidy Modeling with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.