Ensembling techniques

Ensemble learning, or ensembling, is the process of combining multiple predictive models to produce a supermodel that is more accurate than any individual model on its own:

  • Regression: We will take the average of the predictions for each model
  • Classification: Take a vote and use the most common prediction, or take the average of the predicted probabilities

Imagine that we are working on a binary classification problem (predicting either 0 or 1):

# ENSEMBLING import numpy as np # set a seed for reproducibility np.random.seed(12345) # generate 2000 random numbers (between 0 and 1) for each model, representing 2000 observations mod1 = np.random.rand(2000) mod2 = np.random.rand(2000) mod3 = np.random.rand(2000) mod4 = np.random.rand(2000) ...

Get Principles of Data Science - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.