WHAT’S WRONG WITH DUPLICATES?
There are numerous practical arguments in support of the position that duplicate rows (“duplicates” for short) should be prohibited. Here I want to emphasize just one—but I think it’s a powerful one.[50] However, it does rely on certain notions I haven’t discussed yet in this book, so I need to make a couple of preliminary assumptions:
I assume you know that relational DBMSs include a component called the optimizer,[51] whose job is to try to figure out the best way to implement user queries and the like (where “best” basically means best performing).
I assume you also know that one of the things optimizers do is what’s sometimes called query rewrite. Query rewrite is the process of transforming some relational expression exp1 (representing some user query, say) into another such expression exp2, such that exp1 and exp2 are guaranteed to produce the same result when evaluated but exp2 performs better than exp1 (at least, we hope so). Note: Be aware, however, that the term query rewrite is also used in certain commercial products with a different (typically more limited) meaning.
Now I can present my argument. The fundamental point I want to make is that certain expression transformations, and hence certain optimizations, that would be valid if SQL were truly relational aren’t valid in the presence of duplicates. By way of example, consider the (nonrelational) database shown in Figure 4-1. Note right away that the tables in that database have no keys (which ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access