Chapter 3
The Data Mining Process
Chapter 1 describes the virtuous cycle of data mining as a business process that divides data mining into four stages:
1. Identifying the problem
2. Transforming data into information
3. Taking action
4. Measuring the outcome
This chapter shifts the emphasis to data mining as a technical process, moving from identifying business problems to translating business problems into data mining problems. The second stage, transforming data into information, is expanded into several topics including hypothesis testing, model building, and pattern discovery. The ideas and best practices introduced in this chapter are elaborated further in the rest of the book. The purpose of this chapter is to bring the different styles of data mining together in one place.
The best way to avoid breaking the virtuous cycle of data mining is to understand the ways it is likely to fail and take preventive measures. Over the years, the authors have encountered many ways for data mining projects to go wrong. This chapter begins with a discussion of some of these pitfalls. The rest of the chapter is about the data mining process. Later chapters cover the aspects of data mining methodology that are specific to the particular styles of data mining — directed data mining and undirected data mining. This chapter focuses on what these approaches have in common.
The three main styles of data mining are introduced, beginning with the simplest approach — testing hypotheses typically ...