Skip to Content
Data Science from Scratch
book

Data Science from Scratch

by Joel Grus
April 2015
Beginner
328 pages
7h 18m
English
O'Reilly Media, Inc.
Content preview from Data Science from Scratch

Chapter 17. Decision Trees

A tree is an incomprehensible mystery.

Jim Woodring

DataSciencester’s VP of Talent has interviewed a number of job candidates from the site, with varying degrees of success. He’s collected a data set consisting of several (qualitative) attributes of each candidate, as well as whether that candidate interviewed well or poorly. Could you, he asks, use this data to build a model identifying which candidates will interview well, so that he doesn’t have to waste time conducting interviews?

This seems like a good fit for a decision tree, another predictive modeling tool in the data scientist’s kit.

What Is a Decision Tree?

A decision tree uses a tree structure to represent a number of possible decision paths and an outcome for each path.

If you have ever played the game Twenty Questions, then it turns out you are familiar with decision trees. For example:

  • “I am thinking of an animal.”

  • “Does it have more than five legs?”

  • “No.”

  • “Is it delicious?”

  • “No.”

  • “Does it appear on the back of the Australian five-cent coin?”

  • “Yes.”

  • “Is it an echidna?”

  • “Yes, it is!”

This corresponds to the path:

“Not more than 5 legs” → “Not delicious” → “On the 5-cent coin” → “Echidna!”

in an idiosyncratic (and not very comprehensive) “guess the animal” decision tree (Figure 17-1).

Guess the animal.
Figure 17-1. A “guess the animal” decision tree

Decision trees have a lot to recommend ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Science from Scratch, 2nd Edition

Data Science from Scratch, 2nd Edition

Joel Grus
Doing Data Science

Doing Data Science

Cathy O'Neil, Rachel Schutt
Learning Data Science

Learning Data Science

Sam Lau, Joseph Gonzalez, Deborah Nolan

Publisher Resources

ISBN: 9781491901410Errata Page