Chapter 3

Into the Forest, Randomly

IN THIS CHAPTER

Looking at random forests

Growing a random forest for irises

Developing a random forest for glass identification

In Chapter 2 of Book 4, I help you explore decision trees. Suppose a decision tree is an expert decision-maker: Give a tree a set of data, and it makes decisions about the data. Taking this idea a step further, suppose you have a panel of experts — a group of decision trees — and each one makes a decision about the same data. One could poll the panel to come up with the best decision.

This is the idea behind the random forest — a collection of decision trees that you can poll, and the majority vote ends up being the decision.

Growing a Random Forest

How does all this happen? How do you create a forest out of a dataset? Well, randomly.

Here’s what I mean. In Chapter 2 of Book 4, I discuss the creation of a decision tree from a dataset. I use the Rattle package to partition a data frame into a training set, a validation set, and a test set. The partitioning takes place as a result of random sampling from the rows in the data frame. The default condition is that Rattle randomly assigns 70 percent of the rows to the ...

Get R All-in-One For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

R All-in-One For Dummies by Joseph Schmuller

Into the Forest, Randomly

Growing a Random Forest

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly