O'Reilly logo

Mastering Predictive Analytics with R - Second Edition by Rui Miguel Forte, James D. Miller

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Categorizing data quality

It is perhaps an accepted notion that issues with data quality may be categorized into one of the following areas:

  • Accuracy
  • Completeness
  • Update status
  • Relevance
  • Consistency (across sources)
  • Reliability
  • Appropriateness
  • Accessibility

The quality or level of quality of your data can be affected by the way it is entered, stored, and managed. The process of addressing data quality (referred to most often as data quality assurance (DQA)) requires a routine and regular review and evaluation of the data and performing ongoing processes termed profiling and scrubbing (this is vital even if the data is stored in multiple disparate systems, making these processes difficult).

Here, tidying the data will be much more project centric in that ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required