O'Reilly logo

Data Mining and Predictive Analytics, 2nd Edition by Daniel T. Larose, Chantal D. Larose

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 11Decision Trees

11.1 What is a Decision Tree?

In this chapter, we continue our examination of classification methods for data mining. One attractive classification method involves the construction of a decision tree, a collection of decision nodes, connected by branches, extending downward from the root node until terminating in leaf nodes. Beginning at the root node, which by convention is placed at the top of the decision tree diagram, attributes are tested at the decision nodes, with each possible outcome resulting in a branch. Each branch then leads either to another decision node or to a terminating leaf node. Figure 11.1 provides an example of a simple decision tree.

c11f001

Figure 11.1 Simple decision tree.

The target variable for the decision tree in Figure 11.1 is credit risk, with potential customers being classified as either good or bad credit risks. The predictor variables are savings (low, medium, and high), assets (low or not low), and income (≤$30,000 or >$30,000). Here, the root node represents a decision node, testing whether each record has a low, medium, or high savings level (as defined by the analyst or domain expert). The data set is partitioned, or split, according to the values of this attribute. Those records with low savings are sent via the leftmost branch (savings = low) to another decision node. The records with high savings are sent via the ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required