O'Reilly logo

Learning Data Mining with R by Bater Makhabel

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data cleaning

Data cleaning is one part of data quality. The aim of Data Quality (DQ) is to have the following:

  • Accuracy (data is recorded correctly)
  • Completeness (all relevant data is recorded)
  • Uniqueness (no duplicated data record)
  • Timeliness (the data is not old)
  • Consistency (the data is coherent)

Data cleaning attempts to fill in missing values, smooth out noise while identifying outliers, and correct inconsistencies in the data. Data cleaning is usually an iterative two-step process consisting of discrepancy detection and data transformation.

The process of data mining contains two steps in most situations. They are as follows:

  • The first step is to perform audition on the source dataset to find the discrepancy.
  • The second step is to choose the transformation ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required