IN THIS CHAPTER
Looking at random forests
Growing a random forest for irises
Developing a random forest for glass identification
In Chapter 7, I help you explore decision trees. Suppose a decision tree is an expert decision-maker: Give a tree a set of data, and it makes decisions about the data. Taking this idea a step further, suppose you have a panel of experts — a group of decision trees — and each one makes a decision about the same data. One could poll the panel to come up with the best decision.
This is the idea behind the random forest — a collection of decision trees that you can poll, and the majority vote is the decision.
So how does all this happen? How do you create a forest out of a dataset? Well, randomly.
Here's what I mean. In Chapter 7, I discuss the creation of a decision tree from a dataset. I use the
rattle package to partition a data frame into a training set, a validation set, and a test set. The partitioning takes place as a result of random sampling from the rows in the data frame. The default condition is that
rattle randomly assigns 70 percent of the rows to the training set, 15 percent to the ...