As mentioned previously, we will be using the xgboost package in this section, which we have already loaded. Given the method's well-earned reputation, let's try it on the diabetes data.
As stated in the boosting overview, we will be tuning a number of parameters:
- nrounds: The maximum number of iterations (number of trees in final model).
- colsample_bytree: The number of features, expressed as a ratio, to sample when building a tree. Default is 1 (100% of the features).
- min_child_weight: The minimum weight in the trees being boosted. Default is 1.
- eta: Learning rate, which is the contribution of each tree to the solution. Default is 0.3.
- gamma: Minimum loss reduction required to make another leaf ...