Chapter 6: Detecting and Correcting Data Errors

Introduction

Data sets contain available facts and information providing evidence whether a belief or hypothesis is true. During collection, inconsistencies and anomalies can occur in the raw data that must be resolved to make your data ready to be used to answer questions. Unexpected missing values, incorrect flow through skip patterns, incomplete data, and combining multiple data sets with different attributes all require careful investigation and alleviation during data preparation. Data cleansing is an iterative and interactive process that programmers ideally perform both during and after collection. While this can be a time-consuming and costly task, the Data Detective’s Toolkit provides ...

Get The Data Detective's Toolkit now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.