Chapter 1 introduced us to data mining, and the cross-industry standard process for data mining (CRISP-DM) standard process for data mining model development. In phase 1 of the data mining process, business understanding or research understanding, businesses and researchers first enunciate project objectives, then translate these objectives into the formulation of a data mining problem definition, and finally prepare a preliminary strategy for achieving these objectives.
Here in this chapter, we examine the next two phases of the CRISP-DM standard process, data understanding and data preparation. We will show how to evaluate the quality of the data, clean the raw data, deal with missing data, and perform transformations on certain variables. All of Chapter 3 is devoted to this very important aspect of the data understanding phase. The heart of any data mining project is the modeling phase, which we begin examining in Chapter 7.
Much of the raw data contained in databases is unpreprocessed, incomplete, and noisy. For example, the databases may contain
In order to be useful for data mining purposes, the databases need to undergo preprocessing, in the form of data cleaning and data transformation. Data mining often deals with data that ...