Skip to Content
Python Data Science Handbook, 2nd Edition
book

Python Data Science Handbook, 2nd Edition

by Jake VanderPlas
December 2022
Beginner to intermediate
588 pages
13h 43m
English
O'Reilly Media, Inc.
Content preview from Python Data Science Handbook, 2nd Edition

Chapter 44. In Depth: Decision Trees and Random Forests

Previously we have looked in depth at a simple generative classifier (naive Bayes; see Chapter 41) and a powerful discriminative classifier (support vector machines; see Chapter 43). Here we’ll take a look at another powerful algorithm: a nonparametric algorithm called random forests. Random forests are an example of an ensemble method, meaning one that relies on aggregating the results of a set of simpler estimators. The somewhat surprising result with such ensemble methods is that the sum can be greater than the parts: that is, the predictive accuracy of a majority vote among a number of estimators can end up being better than that of any of the individual estimators doing the voting! We will see examples of this in the following sections.

We begin with the standard imports:

In [1]: %matplotlib inline
        import numpy as np
        import matplotlib.pyplot as plt
        plt.style.use('seaborn-whitegrid')

Motivating Random Forests: Decision Trees

Random forests are an example of an ensemble learner built on decision trees. For this reason, we’ll start by discussing decision trees themselves.

Decision trees are extremely intuitive ways to classify or label objects: you simply ask a series of questions designed to zero in on the classification. For example, if you wanted to build a decision tree to classify animals you come across while on a hike, you might construct the one shown in Figure 44-1.

Figure 44-1. An example of a binary decision ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Data Science Handbook

Python Data Science Handbook

Jake VanderPlas

Publisher Resources

ISBN: 9781098121211Errata PageSupplemental Content