O'Reilly logo

Spark for Data Science by Bikramaditya Singhal, Srinivas Duvvuri

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Ensembles

As the name suggests, ensemble methods use multiple learning algorithms to obtain a more accurate model in terms of prediction accuracy. Usually these techniques require more computing power and make the model more complex, which makes it difficult to interpret. Let us discuss the various types of ensemble techniques available on Spark.

Random forests

A random forest is an ensemble technique for the decision trees. Before we get to random forests, let us see how it has evolved. We know that decision trees usually have high variance issues and tend to overfit the model. To address this, a concept called bagging (also known as bootstrap aggregating) was introduced. For the decision trees, the idea was to take multiple training sets (bootstrapped ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required