December 2018
Beginner to intermediate
684 pages
21h 9m
English
XGBoost optimized computation in several respects to enable multithreading by keeping data in memory in compressed column blocks, where each column is sorted by the corresponding feature value. XGBoost computes this input data layout once before training and reuses it throughout to amortize the additional up-front cost. The search for split statistics over columns becomes a linear scan when using quantiles that can be done in parallel with easy support for column subsampling.
The subsequently released LightGBM and CatBoost libraries built on these innovations, and LightGBM further accelerated training through optimized threading and reduced memory usage. Because of their open source nature, libraries ...