Chapter 2. Data Cleaning

Clean data is an essential element of good data analysis. Poor data quality is a primary reason for problems in business intelligence analysis. Data cleaning is the process of transforming raw data into usable data. Cleaning data, checking quality, and standardizing data types accounts for the majority of an analytic project schedule.

Anthony Goldbloom, the CEO of Kaggle, said: Eighty percent of data science is cleaning data and the other twenty percent is complaining about cleaning data (personal communication, February 14, 2016).

This chapter covers four key topics using some of the newer packages available within the R environment:

  • Summarizing your data for inspection
  • Finding and fixing flawed data
  • Converting inputs to data ...

Get Introduction to R for Business Intelligence now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.