© Stylianos Kampakis  2020
S. KampakisThe Decision Maker's Handbook to Data Sciencehttps://doi.org/10.1007/978-1-4842-5494-3_4

4. How to Keep Data Tidy

Stylianos Kampakis1 
(1)
London, UK
 

Dirty data refers to data that can suffer from all sorts of problems, including, but not limited to, things such as erroneous or conflicting entries, missing values, and outdated data. Tidy data is the opposite, data that is in a nice format, with no inconsistencies or other issues.

Dirty data can cause all sorts of problems. First, it makes consolidation of different data sources difficult or sometimes outright impossible. Second, many of the data points might not be usable. This can reduce the effective size of your data. You might be holding 5GB of data, but only ...

Get The Decision Maker's Handbook to Data Science: A Guide for Non-Technical Executives, Managers, and Founders now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.