7 Validation and Application

7.1 Introduction to Methods for Validation

Data mining can be used to explain the relationships between variables. It can also be used to produce models that can predict outcomes for each person or case. The closer the predictions are to the observed outcomes, the better the model. The model is assessed in three stages: business evaluation, statistical validation and application on the full population including the corresponding target variables.

The importance of business evaluation has been stressed throughout the book and refers to checking all the time that the models make sense in the business scenario. As discussed in Section 2.6, evaluation and validation of the models are carried out during the analysis stage. Two or three of the most promising models are selected from the contending models and applied on the test samples. The statistical validation and results can then be compared to help choose one or more final models.

The chosen models are then applied to the full population, and there is a further occasion when it is important to assess the results of the model. This is the final and very important stage of validation.

Validation methods include tests on the quality ...

Get A Practical Guide to Data Mining for Business and Industry now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.