Random forests

The final classifier that we will be discussing in this chapter is the aptly named Random Forest and is an example of a meta-technique called ensemble learning. The idea and logic behind random forests follows.

Given that (unpruned) decision trees can be nearly bias-less high variance classifiers, a method of reducing variance at the cost of a marginal increase of bias could greatly improve upon the predictive accuracy of the technique. One salient approach to reducing the variance of decision trees is to train a bunch of unpruned decision trees on different random subsets of the training data, sampling with replacement—this is called bootstrap aggregating or bagging. At the classification phase, the test observation is run ...

Get Data Analysis with R - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.