3Data preprocessing and model evaluation
This chapter covers the following items:
–Data preprocessing, data cleaning, data integration, data reduction and data transformation
–Attribute subset selection: normalization
–Classification of data
–Model evaluation and selection
Today, owing to their large sizes and heterogeneous sources, real-world datasets are prone to noisy, inconsistent data and missing data. For high-quality mining, it is vital that data are of high quality. Several data preprocessing techniques exist so as to enhance the quality of data, which result in mining. For example, data cleaning is applied for the removal of noise and correction of data inconsistency. Data integration unites data that come from varying sources, making ...