O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Random forest regression

Random forests are known as ensembles of decision trees formed by combining many decision trees. Like decision trees, random forests can handle categorical features, support multiclass, and don't require feature scaling.

Let's train the bike sharing dataset by splitting it into 80 % training and 20% testing, use RandomForestRegressor with Regression Evaluator from Spark to build the model, and get evaluation metrics around the test data.

@transient lazy val logger = Logger.getLogger(getClass.getName) def randForestRegressionWithVectorFormat(vectorAssembler:   VectorAssembler, vectorIndexer: VectorIndexer, dataFrame:    DataFrame) = {    val lr = new RandomForestRegressor()     .setFeaturesCol("features")  .setLabelCol("label") ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required