Categorizing data quality
It is perhaps an accepted notion that issues with data quality may be categorized into one of the following areas:
- Accuracy
- Completeness
- Update status
- Relevance
- Consistency (across sources)
- Reliability
- Appropriateness
- Accessibility
The quality or level of quality of your data can be affected by the way it is entered, stored, and managed. The process of addressing data quality (referred to most often as data quality assurance (DQA)) requires a routine and regular review and evaluation of the data and performing ongoing processes termed profiling and scrubbing (this is vital even if the data is stored in multiple disparate systems, making these processes difficult).
Here, tidying the data will be much more project centric in that ...
Get Mastering Predictive Analytics with R - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.