O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Maximum bins

Finally, we will perform our evaluation on the impact of setting the number of bins for the decision tree. As with the tree depth, a larger number of bins should allow the model to become more complex, and might help performance with larger feature dimensions. After a certain point, it is unlikely that it will help any more, and might, in fact, hinder performance on the test set due to over-fitting.

Scala

object DecisionTreeMaxBins{   def main(args: Array[String]) {     val data = DecisionTreeUtil.getTrainTestData()     val train_data = data._1     val test_data = data._2     val iterations = 10     val bins_param = Array(2, 4, 8, 16, 32, 64, 100)     val maxDepth = 5     val categoricalFeaturesInfo = scala.Predef.Map[Int, Int]()     val i = 0  val ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required