Extreme gradient boosting - classification

As mentioned previously, we will be using the xgboost package in this section, which we have already loaded. Given the method's well-earned reputation, let's try it on the diabetes data.

As stated in the boosting overview, we will be tuning a number of parameters:

  • nrounds: The maximum number of iterations (number of trees in final model).
  • colsample_bytree: The number of features, expressed as a ratio, to sample when building a tree. Default is 1 (100% of the features).
  • min_child_weight: The minimum weight in the trees being boosted. Default is 1.
  • eta: Learning rate, which is the contribution of each tree to the solution. Default is 0.3.
  • gamma: Minimum loss reduction required to make another leaf ...

Get Mastering Machine Learning with R - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.