DUPLICATES: FURTHER ISSUES
There’s much, much more that could be said regarding duplicates and what’s wrong with them, but I’ll limit myself here to just three further points. First of all, you might reasonably object that in practice base tables, at least, never do include duplicates, and the foregoing example thus intuitively fails. True enough (probably); but the trouble is, SQL can generate duplicates in query results. Indeed, different formulations of the same query can produce results with different degrees of duplication, even if the input tables themselves have no duplicates at all. For example, here are two possible formulations of the query “Get supplier numbers for suppliers who supply at least one part” on our usual suppliers-and-parts database (and note here that the input tables certainly don’t contain any duplicates):

At least one of these expressions—which?—will produce a result with duplicates, in general. (Exercise: Given our usual sample data values, what results do the two expressions produce?) So if you don’t want to think of the tables in Figure 4-1 as base tables specifically, fine: Just take them to be the output from previous queries, and the rest of the analysis goes through unchanged.
Second, there’s another at least psychological argument against duplicates that I think is quite persuasive (thanks to Jonathan Gennick for this one): If, in accordance with ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access