Chapter 6. Tree-Based Methods

In the world of machine learning, tree-based methods are very useful. They are relatively simple to explain and easy to visualize. In some cases with machine learning models (notably complex neural networks), the trained model can effectively be a black box whose inner workings are too complex for us to explain simply. Tree-based models, on the other hand, can be a lot more intuitive for the average user.

In this chapter, we look at how tree-based models work at a high level by focusing first on decision trees. We then dive into the basic mechanics of how they work and some positive and negative attributes associated with them. We also touch on different types of tree-based models like conditional inference trees and random forests. To give you a preview, decision trees are as simple as “if-then” statements related to data. Conditional inference trees work in a similar manner but with slightly different statistical underpinnings. Random forests can be complicated mathematically, but generally boil down to a collection of different tree models being asked to vote on a result. All of these types can be used for regression modeling (regression trees) or classification modeling (classification trees). Many can be used for both purposes and are called classification and regression trees (CART) models.

A Simple Tree Model

Let’s begin by looking at an example of a set of data that describes my bike races this year. We could have a variety of parameters ...

Get Introduction to Machine Learning with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.