Visual Six Sigma: Making Data Analysis Lean

5.5. Modeling Relationships

For a traditional confirmatory analysis, Jane needs to use an analysis method that permits hypothesis testing. She could use logistic regression, with Defect (<=5%) as her nominal response. However, Jane feels that the partition analysis provided a good examination of this response and, besides, the continuous variable % Price Increase should be more informative. For this reason, she decides to model % Price Increase as a function of the Xs using Fit Model.

But which Xs should she include? And what about interactions, which the partition analysis clearly indicated were of importance? Fit Model fits a multiple linear regression. Jane realizes that for a regression model, nominal variables with too many values can cause issues relative to estimating model coefficients. So she needs to be selective relative to which nominal variables to include. In particular, she sees no reason to include Customer ID or Sales Rep, as these variables would not easily help her address root causes even if they were significant.

Now Region, thinks Jane, is an interesting variable. There could well be interactions between Region and the other Xs, but she does not see how these would be helpful in addressing root causes. The sales representatives need to be able to sell in all regions. She decides to include Region in the model to verify whether there is a Region effect (her exploratory work has suggested that there is not), but not to include any interactions with Region

Get Visual Six Sigma: Making Data Analysis Lean now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Visual Six Sigma: Making Data Analysis Lean by Ian Cox, Marie A. Gaudard, Philip J. Ramsey, Mia L. Stephens, Leo Wright

5.5. Modeling Relationships

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly