Feature importance for RandomForests

As discussed in the conclusion of Chapter 3, The Data Pipeline, selecting the right variables can improve your learning process by reducing noise, the variance of estimates, and the burden of too many computations. Ensemble methods, such as RandomForest in particular, can provide you with a different view of the role played by a variable when working together with other ones in your dataset.

Here, we show you how to extract the importance of RandomForest and Extra-Tree models. Importance is calculated in the fashion originally described in the book Classification and Regression Trees by Breiman, Friedman et al. in 1984. It was a true classic that laid solid foundations for classification trees. In the ...

Get Python Data Science Essentials - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.