3Data preprocessing and model evaluation

This chapter covers the following items:

–Data preprocessing, data cleaning, data integration, data reduction and data transformation

–Attribute subset selection: normalization

–Classification of data

–Model evaluation and selection

Today, owing to their large sizes and heterogeneous sources, real-world datasets are prone to noisy, inconsistent data and missing data. For high-quality mining, it is vital that data are of high quality. Several data preprocessing techniques exist so as to enhance the quality of data, which result in mining. For example, data cleaning is applied for the removal of noise and correction of data inconsistency. Data integration unites data that come from varying sources, making ...

Get Computational Methods for Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.