Chapter 5
Build Decision Trees
About the Tasks That You Will Perform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Automatically Train and Prune a Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Interactively Train a Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Create a Gradient Boosting Model of the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
About the Tasks That You Will Perform
Now that you have verified the input data, it is time to build predictive models. You
perform the following tasks to model the input data using nonparametric decision trees:
1. You enable SAS Enterprise Miner to automatically train a full decision tree and to
automatically prune the tree to an optimal size. When training the tree, you select
split rules at each step to maximize the split decision logworth. Split decision
logworth is a statistic that measures the effectiveness of a particular split decision at
differentiating values of the target variable. For more information about logworth,
see the SAS Enterprise Miner Help.
2. You interactively train a decision tree. At each step, you select from a list of
candidate rules to define the split rule that you deem to be the best.
3. You use a Gradient Boosting node to generate a set of decision trees that form a
single predictive model. Gradient boosting is a boosting approach that resamples the
analysis data set several times to generate results that form a weighted average of the
re-sampled data set.
Automatically Train and Prune a Decision Tree
Decision tree models are advantageous because they are conceptually easy to
understand, yet they readily accommodate nonlinear associations between input
variables and one or more target variables. They also handle missing values without the
need for imputation. Therefore, you decide to first model the data using decision trees.
You will compare decision tree models to other models later in the example.
However, before you add and run the Decision Tree node, you will add a Control Point
node. The Control Point node is used to simplify a process flow diagram by reducing the
21
number of connections between multiple interconnected nodes. By the end of this
example, you will have created five different models of the input data set, and two
Control Point nodes to connect these nodes. The first Control Point node, added here,
will distribute the input data to each of these models. The second Control Point node will
collect the models and send them to evaluation nodes.
To use the Control Point node:
1. Select the Utility tab on the Toolbar.
2. Select the Control Point node icon. Drag the node into the Diagram Workspace.
3. Connect the Replacement node to the Control Point node.
SAS Enterprise Miner enables you to build a decision tree in two ways: automatically
and interactively. You will begin by letting SAS Enterprise Miner automatically train and
prune a tree.
To use the Decision Tree node to automatically train and prune a decision tree:
1. Select the Model tab on the Toolbar.
2. Select the Decision Tree node icon. Drag the node into the Diagram Workspace.
3. Connect the Control Point node to the Decision Tree node.
4. Select the Decision Tree node. In the Properties Panel, scroll down to view the
Train properties:
Click on the value of the Maximum Depth splitting rule property, and enter 10.
This specification enables SAS Enterprise Miner to train a tree that includes up to
ten generations of the root node. The final tree in this example, however, will
have fewer generations due to pruning.
Click on the value of the Leaf Size node property, and enter 8. This specification
constrains the minimum number of training observations in any leaf to eight.
22 Chapter 5 Build Decision Trees

Get Getting Started with SAS Enterprise Miner 14.1 now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.