Gradient boosting is another improved version of boosting. Like AdaBoost, it is based on a gradient descent function. The algorithm has proven to be one of the most proficient ones from the ensemble, though it is characterized by an increased variance of estimates, more sensibility to noise in data (both problems could be attenuated by using sub-sampling), and significant computational costs due to nonparallel operations.
To demonstrate how GTB performs, we will again try checking whether we can improve our predictive performance on the covertype dataset, which was already examined when illustrating linear SVM and ensemble algorithms:
In: import pickle covertype_dataset = pickle.load(open("covertype_dataset.pickle", ...