Ensembling techniques
Ensemble learning, or ensembling, is the process of combining multiple predictive models to produce a supermodel that is more accurate than any individual model on its own:
- Regression: We will take the average of the predictions for each model
- Classification: Take a vote and use the most common prediction, or take the average of the predicted probabilities
Imagine that we are working on a binary classification problem (predicting either 0 or 1):
# ENSEMBLING import numpy as np # set a seed for reproducibility np.random.seed(12345) # generate 2000 random numbers (between 0 and 1) for each model, representing 2000 observations mod1 = np.random.rand(2000) mod2 = np.random.rand(2000) mod3 = np.random.rand(2000) mod4 = np.random.rand(2000) ...
Get Principles of Data Science - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.