4Data Preprocess
Md. Sharif Hossen
Department of Information and Communication Technology, Comilla University, Cumilla, Bangladesh
Abstract
A large amount of data is collected every day from different sources. Most of them are unprocessed, which are difficult to analyze and sometimes become useless as the datasets tend to be inconsistent, missing, and noisy. Before using those datasets, the quality must be maintained. Data preprocessing is a step of the knowledge discovery process that ensures the consistency and quality of the data. Data preparation is a compulsory step in data preprocessing, which prepares the useless data in a usable format to analyze in the next step of data mining. There are several techniques in data preparation, e.g., data cleaning, integration, reduction, transformation, normalization, de-noising, and dimensionality reduction, and so on. In this chapter, we will discuss how to measure the quality of data, address missing data, clean the noisy data, and perform transformation on certain variables.
Keywords: Data processing, data mining, knowledge discovery, data transformation
4.1 Introduction
Real-world raw data can be unarranged, inconsistent, incomplete, incorrect, unprocessed, and dirty. Data processing can play an important role to enhance the quality of data by providing the accuracy and efficiency of the data in the succeeding process. Good decisions are made to ensure the data quality by finding anomalies in data, filtering them as soon as possible. ...
Get Machine Learning and Big Data now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.