O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Tree depth

We would generally expect performance to increase with more complex trees (that is, trees of greater depth). Having a lower tree depth acts as a form of regularization, and it might be the case that as with L2 or L1 regularization in linear models, there is a tree depth that is optimal with respect to the test set performance.

Here, we will try to increase the depth of trees to see what impact they have on the test set RMSLE, keeping the number of bins at the default level of 32:

Scala

val data = DecisionTreeUtil.getTrainTestData()   val train_data = data._1   val test_data = data._2   val iterations = 10   val bins_param = Array(2, 4, 8, 16, 32, 64, 100)   val depth_param = Array(1, 2, 3, 4, 5, 10, 20)   val bin = 32  val categoricalFeaturesInfo ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required