Skip to Main Content
Visual Six Sigma: Making Data Analysis Lean
book

Visual Six Sigma: Making Data Analysis Lean

by Ian Cox, Marie A. Gaudard, Philip J. Ramsey, Mia L. Stephens, Leo Wright
December 2009
Beginner to intermediate content levelBeginner to intermediate
504 pages
13h 37m
English
Wiley
Content preview from Visual Six Sigma: Making Data Analysis Lean

9.4. Constructing the Training, Validation, and Test Sets

At this point, Mi-Ling has accumulated enough knowledge to realize that she should be able to build a strong classification model. She is ready to move on to the Model Relationships step of the Visual Six Sigma Data Analysis Process. However, she anticipates that the marketing study will result in a large and unruly data set, probably with many outliers, some missing values, irregular distributions, and some categorical data. It will not be nearly as small or as clean as her practice data set. So, she wants to consider modeling from a data-mining perspective.

From her previous experience, Mi-Ling knows that some data-mining techniques, such as recursive partitioning and neural nets, fit highly parameterized nonlinear models that have the potential to fit the anomalies and noise in a data set, as well as the signal. These data-mining techniques do not allow for variable selection based on hypothesis tests, which, in classical modeling, help the analyst choose models that do not overfit or underfit the data.

To balance the competing forces of overfitting and underfitting in data-mining efforts, one often divides the available data into at least two and sometimes three distinct sets. Since the tendency to overfit data may introduce bias into models fit and validated using the same data, just a portion of the data, called the training set, is used to construct several potential models. One then assesses the performance of these ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Statistics for Six Sigma Green Belts with Minitab and JMP

Statistics for Six Sigma Green Belts with Minitab and JMP

David M. Levine
Visual Six Sigma, 2nd Edition

Visual Six Sigma, 2nd Edition

Ian Cox, Marie A. Gaudard, Mia L. Stephens

Publisher Resources

ISBN: 9780470506912Purchase book