Variable importance
Statistical models, say linear regression and logistic regression, indicate which variables are significant with measures such as p-value and t-statistics. In a decision tree, a split is caused by a single variable. If the specification of the number of variables for the surrogate splits, a certain variable may appear as the split criteria more than once in the tree and some variables may never appear in the tree splits at all. During each split, we select the variable that leads to the maximum reduction in impurity, and the contribution of a variable across the tree splits would also be different. The overall improvement across each split of the tree (by the reduction in impurity for the classification tree or by the improvement ...
Get Hands-On Ensemble Learning with R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.