July 2018
Beginner to intermediate
376 pages
9h 1m
English
Statistical models, say linear regression and logistic regression, indicate which variables are significant with measures such as p-value and t-statistics. In a decision tree, a split is caused by a single variable. If the specification of the number of variables for the surrogate splits, a certain variable may appear as the split criteria more than once in the tree and some variables may never appear in the tree splits at all. During each split, we select the variable that leads to the maximum reduction in impurity, and the contribution of a variable across the tree splits would also be different. The overall improvement across each split of the tree (by the reduction in impurity for the classification tree or by the improvement ...