Pasting by random samples

Pasting is the first type of averaging ensembling we will discuss. In pasting, a certain number of estimators are built using small samples taken from the data (using sampling without replacement). Finally, the results are pooled and the estimate is obtained by averaging the results, in the case of regression, or by taking the most voted class when dealing with classification. Pasting is very useful when dealing with very large data (such as the case where it cannot fit into the memory) because it allows dealing with only those portions of data manageable by the available RAM and computational resources of your computer.

As a method, Leo Breiman, the creator of the RandomForest algorithm, first devised this strategy. ...

Get Python Data Science Essentials - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.