9 CLASSIFICATION AND REGRESSION TREES

This chapter describes a flexible data‐driven method that can be used for both classification (called classification tree) and prediction (called regression tree). Among the data‐driven methods, trees (also known as decision trees) are the most transparent and easy to interpret. Trees are based on separating records into subgroups by creating splits on predictors. These splits create logical rules that are transparent and easily understandable, such as “IF Age less-than 55 AND Education greater-than 12, THEN class = 1.” The resulting subgroups should be more homogeneous in terms of the outcome variable, thereby creating useful prediction or classification rules. We discuss the two key ideas underlying trees: recursive partitioning (for constructing the tree) and pruning (for cutting the tree back). In the context of tree construction, we also describe a few metrics of homogeneity that are popular in tree algorithms for determining the homogeneity of the resulting subgroups of records. We explain that pruning is a useful strategy for avoiding overfitting and describe alternative strategies for avoiding overfitting. As with other data‐driven methods, trees require large amounts of data. However, once constructed, they are computationally cheap to deploy even on ...

Get Machine Learning for Business Analytics, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.