6.1.1 The Origins: Bagging Predictors

Breiman introduced the term bagging as an acronym for Bootstrap AGGregatING [46]. The idea of bagging is simple and appealing: the ensemble is made of classifiers built on bootstrap replicates of the training set. The classifier outputs are combined by the plurality vote [47].

The diversity necessary to make the ensemble work is created by using different training sets. Ideally, the training sets should be generated randomly from the distribution of the problem. In practice, we can only afford one labeled training set, Z = {z1, …, zN}, and have to imitate the process or random generation of L training sets. We sample with replacement from the original training set (bootstrap sampling [115]) to create a new training set of length N. To make use of the variations of the training set, the base classifier should be unstable. In other words, small changes in the training set should lead to large changes in the classifier output. Otherwise, the resultant ensemble will be a collection of almost identical classifiers, therefore unlikely to improve on a single classifier’s performance. Figure 6.1 shows the training and operation of bagging.


Training: Given is a labeled data set Z = {z1, …, zN}.

  1. Choose the ensemble size L and the base classifier model.
  2. Take L bootstrap samples from Z and train classifiers D1, …, DL, one classifier on each sample.

Operation: For each new object

  1. Classify the ...

Get Combining Pattern Classifiers: Methods and Algorithms, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.