April 2015
Intermediate to advanced
504 pages
11h 41m
English
Relational databases work on the idea that items of data can be uniquely identified. However hard we try, there will always be bad data arriving from somewhere. This recipe shows you how to diagnose that and clean up the mess.
Let's start by looking at our example table, cust. It has a duplicate value in customerid:
postgres=# SELECT * FROM cust; customerid | firstname | lastname | age ------------+-----------+----------+----- 1 | Philip | Marlowe | 38 2 | Richard | Hannay | 42 3 | Holly | Martins | 25 4 | Harry | Palmer | 36 4 | Mark | Hall | 47 (5 rows)
Before you delete duplicate data, remember that sometimes, it isn't the data that is wrong; it is your understanding of it. In those cases, ...