O'Reilly logo

Visual Six Sigma: Making Data Analysis Lean by Leo Wright, Mia L. Stephens, Philip J. Ramsey, Marie A. Gaudard, Ian Cox

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

9.3. Uncovering Relationships

As suggested by the Visual Six Sigma Roadmap (Exhibit 3.30), Mi-Ling begins her analysis of the Wisconsin Breast Cancer Diagnostic Data Set by visualizing the data one variable at a time, two variables at a time, and more than two at a time. This provides her with the knowledge that there are strong relationships between the 30 predictors and the diagnosis into benign or malignant masses.

9.3.1. One Variable at a Time

Mi-Ling opens the data table CellClassification_1.jmp. As a first step, she obtains distribution reports for all of the variables other than ImageID, which is simply an identifier. She notes that each variable other than Diagnosis has a name beginning with Mean, Max, or SE, indicating which summary statistic has been calculated—the mean, max, or standard error of the mean of the measured quantity. She selects Analyze > Distribution and populates the launch dialog as shown in Exhibit 9.3.

Figure 9.3. Launch Dialog for Distribution Platform

Upon clicking OK, she sees 31 distribution reports, the first four of which are shown in Exhibit 9.4. The vertical layout for the graphs is the JMP default. Mi-Ling knows that she can change this either interactively or more permanently in File > Preferences, but she is happy with this layout for now.

Figure 9.4. First 4 of 31 Distribution Reports

The bar graph corresponding to Diagnosis indicates ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required