9.6. Modeling Relationships: Recursive Partitioning
Mi-Ling turns her attention to the JMP partition platform, which implements a version of classification and regression tree analysis.[] The partition platform allows both the response and predictors to be either continuous or categorical. Continuous predictors are split into two partitions according to cutting values, while predictors that are nominal or ordinal are split into two groups of levels. Intuitively, the split is chosen so as to maximize the difference in response between the two branches, or nodes, resulting from the split.
If the response is continuous, the sum of squares due to the difference between means is a measure of the difference in the two groups. Both the variable to be split and the cutting value for the split are determined by maximizing a quantity, called the LogWorth, which is related to the p-value associated with the sum of squares due to the difference between means. In the case of a continuous response, the fitted values are the means within the two groups.
If the response is categorical, as in Mi-Ling's case, the splits are determined by maximizing a LogWorth statistic that is related to the p-value of the likelihood ratio chi-square statistic, which is referred to as G^2. In this case, the fitted values are the estimated proportions, or response rates, within the resulting two groups.
Mi-Ling remembers hearing that the partition platform is useful both for exploring relationships and for modeling: ...
Get Visual Six Sigma: Making Data Analysis Lean now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.