9.8. Comparison of Classification Models

At this point, Mi-Ling is excited. She will select a single model by comparing the performance of all four models on her validation set. For now, she is content to use a single performance criterion, namely, the overall misclassification rate. Of course she realizes that in certain settings, a criterion that weights the two types of misclassification errors differently might be more appropriate. For example, since overlooking a malignant tumor is more serious than incorrectly classifying a benign tumor, one might penalize that first error more severely than the second. For the moment, Mi-Ling puts that issue aside. (We note that one can fit models with general loss functions in the JMP nonlinear platform.)

Mi-Ling switches to her validation set by locating the row state variable Validation Set in the columns panel of the data table, clicking on the red star to its left, and selecting Copy to Row States. All but the 109 observations in her validation set are now excluded and hidden. Next she selects Analyze > Fit Y by X, and enters, as Y, Response, the variables Most Likely Diagnosis (the logistic classification), Partition Prediction, Pred Diagnosis, and Pred Diagnosis 2 (these last two being the neural net classifications). As X, Factor, she enters Diagnosis (the actual diagnosis). She clicks OK.

When the report appears, she decides to remove the Mosaic Plot and Tests reports. To do this, while holding down the control key, she clicks ...

Get Visual Six Sigma: Making Data Analysis Lean now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.