O'Reilly logo

Data Science from Scratch, 2nd Edition by Joel Grus

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 17. Decision Trees

A tree is an incomprehensible mystery.

Jim Woodring

DataSciencester’s VP of Talent has interviewed a number of job candidates from the site, with varying degrees of success. He’s collected a dataset consisting of several (qualitative) attributes of each candidate, as well as whether that candidate interviewed well or poorly. Could you, he asks, use this data to build a model identifying which candidates will interview well, so that he doesn’t have to waste time conducting interviews?

This seems like a good fit for a decision tree, another predictive modeling tool in the data scientist’s kit.

What Is a Decision Tree?

A decision tree uses a tree structure to represent a number of possible decision paths and an outcome for each path.

If you have ever played the game Twenty Questions, then you are familiar with decision trees. For example:

  • “I am thinking of an animal.”

  • “Does it have more than five legs?”

  • “No.”

  • “Is it delicious?”

  • “No.”

  • “Does it appear on the back of the Australian five-cent coin?”

  • “Yes.”

  • “Is it an echidna?”

  • “Yes, it is!”

This corresponds to the path:

“Not more than 5 legs” → “Not delicious” → “On the 5-cent coin” → “Echidna!”

in an idiosyncratic (and not very comprehensive) “guess the animal” decision tree (Figure 17-1).

Guess the animal.
Figure 17-1. A “guess the animal” decision tree

Decision trees have a lot to recommend them. They’re ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required