Chapter 4

Data Understanding and Preparation

OUTLINE

Preamble

Once the data mining process is chosen, the next step is to access, extract, integrate, and prepare the appropriate data set for data mining. Input data must be provided in the amount, structure, and format suited to the modeling algorithm. In this chapter, we will describe the general structure in which we must express our data for modeling and describe the major data cleaning operations that must be performed. In addition, we will describe how to explore your data prior to modeling and how to clean it up. From a database standpoint, a body of data can be regarded ...

Get Handbook of Statistical Analysis and Data Mining Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.