Chapter 15

Model ensembles

15.1 Introduction

The idea of ensemble modeling is to create and combine multiple inductive models for the same domain, possibly obtaining better prediction quality than most or all of them. For this improvement to be possible, the strengths of individual models should be retained or reinforced and their weaknesses should be canceled out or reduced. It turns out that dozens or hundreds of models, even of rather mediocre quality, may produce top-notch predictions as a team.

Ensemble modeling is applicable to the two major predictive modeling tasks, classification and regression. In each case, it may yield substantial improvement over single models at the cost of investing considerably more computation time for multiple model creation and loosing overall human readability, even if each individual model is perfectly human readable. To exploit this potential for better predictive power, appropriate techniques for base model generation and aggregation are required, the most common of which will be discussed in this chapter. The former are mostly task independent and the task-specific aspects of the latter are sufficiently simple and isolated to make most of this discussion applicable both to the classification and regression tasks. It is the former, though, where model ensembles are most often and most successfully used, and some ensemble modeling techniques developed specifically for classification will also be discussed.

Get Data Mining Algorithms: Explained Using R now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.