December 2018
Beginner to intermediate
684 pages
21h 9m
English
Each boosting iteration aims to reduce the training loss so that for a large ensemble, the training error can potentially become very small, increasing the risk of overfitting and poor performance on unseen data. Cross-validation is the best approach to find the optimal ensemble size that minimizes the generalization error because it depends on the application and the available data.
Since the ensemble size needs to be specified before training, it is useful to monitor the performance on the validation set and abort the training process when, for a given number of iterations, the validation error no longer decreases. This technique is called early stopping and frequently used for models that require a large ...