Chapter 8Decision Trees

In Chapter 7, we introduced the naïve Bayes classifier as a machine learning approach that uses the probability of prior events to inform the likelihood of a future event. In this chapter, we introduce a different type of classifier known as a decision tree. Instead of using the probability of prior events to predict future events, the decision tree classifier uses a logical tree-like structure to represent the relationship between predictors and a target outcome.

Decision trees are constructed based on a divide-and-conquer approach, where the original dataset is split repeatedly into smaller subsets until each subset is as homogenous as possible. We discuss this recursive partitioning approach in some length in the early part of the chapter. Later in the chapter, we discuss the process of paring back the size of a decision tree to make it more useful to a wider set of use cases. We wrap up the chapter by training a decision tree model in R, discussing the strengths and weaknesses of the approach and working through a use case.

By the end of this chapter, you will have learned the following:

  • The basic components of a decision tree and how to interpret it
  • How decision trees are constructed based on the process of recursive partitioning and impurity
  • Two of the most popular implementations of decision trees and how they differ in terms of how they measure impurity
  • Why and how decisions trees are pruned
  • How to build a decision tree classifier in R and how ...

Get Practical Machine Learning in R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.