Chapter 3

Into the Forest, Randomly

IN THIS CHAPTER

Bullet Looking at random forests

Bullet Growing a random forest for irises

Bullet Developing a random forest for glass identification

In Chapter 2 of Book 4, I help you explore decision trees. Suppose a decision tree is an expert decision-maker: Give a tree a set of data, and it makes decisions about the data. Taking this idea a step further, suppose you have a panel of experts — a group of decision trees — and each one makes a decision about the same data. One could poll the panel to come up with the best decision.

This is the idea behind the random forest — a collection of decision trees that you can poll, and the majority vote ends up being the decision.

Growing a Random Forest

How does all this happen? How do you create a forest out of a dataset? Well, randomly.

Here’s what I mean. In Chapter 2 of Book 4, I discuss the creation of a decision tree from a dataset. I use the Rattle package to partition a data frame into a training set, a validation set, and a test set. The partitioning takes place as a result of random sampling from the rows in the data frame. The default condition is that Rattle randomly assigns 70 percent of the rows to the ...

Get R All-in-One For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.