Chapter 15Model Evaluation Techniques

As you may recall from Chapter 1, the cross-industry standard process (CRISP) for data mining consists of the following six phases to be applied in an iterative cycle:

  1. Business understanding phase
  2. Data understanding phase
  3. Data preparation phase
  4. Modeling phase
  5. Evaluation phase
  6. Deployment phase.

Nestled between the modeling and deployment phases comes the crucial evaluation phase, the techniques for which are discussed in this chapter. By the time we arrive at the evaluation phase, the modeling phase has already generated one or more candidate models. It is of critical importance that these models be evaluated for quality and effectiveness before they are deployed for use in the field. Deployment of data mining models usually represents a capital expenditure and investment on the part of the company. If the models in question are invalid, then the company's time and money are wasted. In this chapter, we examine model evaluation techniques for each of the six main tasks of data mining: description, estimation, prediction, classification, clustering, and association.

15.1 Model Evaluation Techniques for the Description Task

In Chapter 3, we learned how to apply exploratory data analysis (EDA) to learn about the salient characteristics of a data set. EDA represents a popular and powerful technique for applying the descriptive task of data mining. However, because descriptive techniques make no classifications, predictions, or estimates, an ...

Get Data Mining and Predictive Analytics, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.