O'Reilly logo

Mastering Machine Learning with Spark 2.x by Michal Malohlava, Max Pumperla, Alex Tellez

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Gradient boosting machine

So far, the best AUC we are able to muster is a 15-decision tree RF that has an AUC value of 0.698. Now, let's go through the same process of running a single gradient boosted machine with hardcoded hyper-parameters and then doing a grid search over these parameters to see if we can get a higher AUC using this algorithm.

Recall that a GBM is slightly different than an RF due to its iterative nature of trying to reduce an overall loss function that we declare beforehand. Within MLlib there are three different loss functions to choose from as of 1.6.0:

  • Log-loss: Use this loss function for classification tasks (note that GBM only supports binary classification for Spark. If you wish to use a GBM for multi-class classification, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required