So far, the best AUC we are able to muster is a 15-decision tree RF that has an AUC value of 0.698. Now, let's go through the same process of running a single gradient boosted machine with hardcoded hyper-parameters and then doing a grid search over these parameters to see if we can get a higher AUC using this algorithm.
Recall that a GBM is slightly different than an RF due to its iterative nature of trying to reduce an overall loss function that we declare beforehand. Within MLlib there are three different loss functions to choose from as of 1.6.0:
- Log-loss: Use this loss function for classification tasks (note that GBM only supports binary classification for Spark. If you wish to use a GBM for multi-class classification, ...