Data cleaning

Data cleaning is one part of data quality. The aim of Data Quality (DQ) is to have the following:

  • Accuracy (data is recorded correctly)
  • Completeness (all relevant data is recorded)
  • Uniqueness (no duplicated data record)
  • Timeliness (the data is not old)
  • Consistency (the data is coherent)

Data cleaning attempts to fill in missing values, smooth out noise while identifying outliers, and correct inconsistencies in the data. Data cleaning is usually an iterative two-step process consisting of discrepancy detection and data transformation.

The process of data mining contains two steps in most situations. They are as follows:

  • The first step is to perform audition on the source dataset to find the discrepancy.
  • The second step is to choose the transformation ...

Get R: Data Analysis and Visualization now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.