O'Reilly logo

Introduction to R for Business Intelligence by Jay Gendron

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 2. Data Cleaning

Clean data is an essential element of good data analysis. Poor data quality is a primary reason for problems in business intelligence analysis. Data cleaning is the process of transforming raw data into usable data. Cleaning data, checking quality, and standardizing data types accounts for the majority of an analytic project schedule.

Anthony Goldbloom, the CEO of Kaggle, said: Eighty percent of data science is cleaning data and the other twenty percent is complaining about cleaning data (personal communication, February 14, 2016).

This chapter covers four key topics using some of the newer packages available within the R environment:

  • Summarizing your data for inspection
  • Finding and fixing flawed data
  • Converting inputs to data ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required