Extreme gradient boosting – classification

As mentioned previously, we'll be using the xgboost package in this section. Given the method's well-earned reputation, let's try it on the santander data.

As stated in the boosting overview, you can tune a number of parameters:

  • nrounds: This is the maximum number of iterations (number of trees in the final model).
  • colsample_bytree: This is the number of features, expressed as a ratio, to sample when building a tree. The default is 1 (100% of the features).
  • min_child_weight: This is the minimum weight in the trees being boosted. The default is 1.
  • eta: This is the learning rate, which is the contribution of each tree to the solution. The default is 0.3.
  • gamma: This is the minimum loss reduction required ...

Get Mastering Machine Learning with R - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.