CHAPTER 6 Ensemble Methods

6.1 Motivation

In this chapter we will discuss two of the most popular ML ensemble methods.1 In the references and footnotes you will find books and articles that introduce these techniques. As everywhere else in this book, the assumption is that you have already used these approaches. The goal of this chapter is to explain what makes them effective, and how to avoid common errors that lead to their misuse in finance.

6.2 The Three Sources of Errors

ML models generally suffer from three errors:2

  1. Bias: This error is caused by unrealistic assumptions. When bias is high, the ML algorithm has failed to recognize important relations between features and outcomes. In this situation, the algorithm is said to be “underfit.”
  2. Variance: This error is caused by sensitivity to small changes in the training set. When variance is high, the algorithm has overfit the training set, and that is why even minimal changes in the training set can produce wildly different predictions. Rather than modelling the general patterns in the training set, the algorithm has mistaken noise with signal.
  3. Noise: This error is caused by the variance of the observed values, like unpredictable changes or measurement errors. This is the irreducible error, which cannot be explained by any model.

Consider a training set of observations {xi}i = 1, …, n and real-valued outcomes {yi}i = 1, …, n. Suppose a function f[x] exists, such that y = f[x] + ϵ, where ϵ is white noise with E[ϵi] = 0 ...

Get Advances in Financial Machine Learning now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.