July 2017
Intermediate to advanced
360 pages
8h 26m
English
The concept of feature importance that we previously introduced can also be applied to random forests, computing the average over all trees in the forest:

We can easily test the importance evaluation with a dummy dataset containing 50 features with 20 noninformative elements:
>>> nb_samples = 1000>>> X, Y = make_classification(n_samples=nb_samples, n_features=50, n_informative=30, n_redundant=20, n_classes=2, n_clusters_per_class=5)
The importance of the first 50 features according to a random forest with 20 trees is plotted in the following figure:
As expected, there are a few very important features, ...
Read now
Unlock full access