As mentioned previously, we'll be using the xgboost package in this section. Given the method's well-earned reputation, let's try it on the santander data.
As stated in the boosting overview, you can tune a number of parameters:
- nrounds: This is the maximum number of iterations (number of trees in the final model).
- colsample_bytree: This is the number of features, expressed as a ratio, to sample when building a tree. The default is 1 (100% of the features).
- min_child_weight: This is the minimum weight in the trees being boosted. The default is 1.
- eta: This is the learning rate, which is the contribution of each tree to the solution. The default is 0.3.
- gamma: This is the minimum loss reduction required ...