Chapter 3
Into the Forest, Randomly
IN THIS CHAPTER
Looking at random forests
Growing a random forest for irises
Developing a random forest for glass identification
In Chapter 2 of Book 4, I help you explore decision trees. Suppose a decision tree is an expert decision-maker: Give a tree a set of data, and it makes decisions about the data. Taking this idea a step further, suppose you have a panel of experts — a group of decision trees — and each one makes a decision about the same data. One could poll the panel to come up with the best decision.
This is the idea behind the random forest — a collection of decision trees that you can poll, and the majority vote ends up being the decision.
Growing a Random Forest
How does all this happen? How do you create a forest out of a dataset? Well, randomly.
Here’s what I mean. In Chapter 2 of Book 4, I discuss the creation of a decision tree from a dataset. I use the Rattle
package to partition a data frame into a training set, a validation set, and a test set. The partitioning takes place as a result of random sampling from the rows in the data frame. The default condition is that Rattle
randomly assigns 70 percent of the rows to the ...
Get R All-in-One For Dummies now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.