A Developer's Approach to Data Cleaning

This chapter discusses how a developer might understand and approach the topic of data cleaning using several common statistical methods.

In this chapter, we've broken things into the following topics:

  • Understanding basic data cleaning
  • Using R to detect and diagnose common data issues, such as missing values, special values, outliers, inconsistencies, and localization
  • Using R to address advanced statistical situations, such as transformation, deductive correction, and deterministic imputation

Get Statistics for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.