Chapter 11. Data Quality, or GIGO
GIGO
- 1. Garbage In, Garbage Out
Usually said in response to lusers who complain that a program didn’t “do the right thing” when given imperfect input or otherwise mistreated in some way. Also commonly used to describe failures in human decision making due to faulty, incomplete, or imprecise data.
- 2. Garbage In, Gospel Out
This more recent expansion is a sardonic comment on the tendency human beings have to put excessive trust in “computerized” data.
This chapter should have been first perhaps. But often our job is not to clean up data in production or the import datasets. You may only be a contractor with no authority, or there is no time. Often even if a data cleanup effort is put in place, it is a one-time shot, and then “entropic drift” begins to happen where the same mistakes keep getting made and the data quality (DQ) slowly degrades again. Without procedures and processes in place to keep the data clean, often such efforts happen around system migrations and then everything reverts back to “normal” (that is, no edit checks during or after data entry or import).
This chapter looks at various things we can do to deal with data quality—both guarding from the lack of it and making it better.
Sneaking Data Quality In
My favorite method of surfacing DQ issues to field staff who have the access and ability to clean it up in the source system is to put it on their pre-existing dashboard or queue. Many shops have some such ...
Get Fuzzy Data Matching with SQL now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.