Bootstrap aggregating or bagging is an algorithm introduced by Leo Breiman in 1994, which applies bootstrapping to machine learning problems. Bootstrapping is a statistical procedure that creates datasets from existing data by sampling with replacement. Bootstrapping can be used to analyze the possible values that arithmetic mean, variance, or other quantity can assume.
The algorithm aims to reduce the chance of overfitting with the following steps:
- We generate new training sets from input train data by sampling with replacement
- For each generated training set, we fit a new model
- We combine the results of the models by averaging or majority voting
The following diagram illustrates the steps for bagging, using classification as an example: ...