R: Data Analysis and Visualization
by Tony Fischetti, Brett Lantz, Jaynal Abedin, Hrishi V. Mittal, Bater Makhabel, Edina Berlinger, Ferenc Illés, Milán Badics, Ádám Banai, Gergely Daróczi, Barbara Dömötör, Gergely Gabler, Dániel Havran, Péter Juhász, István Margitai, Balázs Márkus, Péter Medvegyev, Julia Molnár, Balázs Árpád Szucs, Ágnes Tuza, Tamás Vadász, Kata Váradi, Ágnes Vidovics-Dancs
Random forests
The final classifier that we will be discussing in this chapter is the aptly named Random Forest and is an example of a meta-technique called ensemble learning. The idea and logic behind random forests follows thusly:
Given that (unpruned) decision trees can be nearly bias-less high variance classifiers, a method of reducing variance at the cost of a marginal increase of bias could greatly improve upon the predictive accuracy of the technique. One salient approach to reducing variance of decision trees is to train a bunch of unpruned decision trees on different random subsets of the training data, sampling with replacement—this is called bootstrap aggregating or bagging. At the classification phase, the test observation is run through ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access