O'Reilly logo

Practical Data Analysis - Second Edition by Dr. Sampath Kumar, Hector Cuesta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data scrubbing

Scrubbing data, also called data cleansing, is the process of correcting or removing data in a dataset that is incorrect, inaccurate, incomplete, improperly formatted, or duplicated.

The result of the data analysis process not only depends on the algorithms, it depends on the quality of the data. That's why the next step after obtaining the data, is data scrubbing. In order to avoid dirty data, our dataset should possess the following characteristics:

  • Correct
  • Completeness
  • Accuracy
  • Consistency
  • Uniformity

Dirty data can be detected by applying some simple statistical data validation and also by parsing the texts or deleting duplicate values. Missing or sparse data can lead you to highly misleading results.

Statistical methods

In this method, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required