AVOIDING DUPLICATES IN SQL
The relational model prohibits duplicates; to use SQL relationally, therefore, steps must be taken to prevent them from occurring. Now, if every base table has at least one key (see Chapter 5), then duplicates will never occur in base tables as such. As already mentioned, however, certain SQL expressions can still yield result tables with duplicates. Here are some of the cases in which such tables can be produced:
SELECT ALL
UNION ALL
VALUES (i.e., table value constructor invocations)
Regarding VALUES, see Chapter 3. Regarding ALL, note first that this keyword (and its alternative, DISTINCT) can be specified:
In a SELECT clause, immediately following the SELECT keyword
In a union, intersection, or difference, immediately following the applicable keyword (UNION, INTERSECT, and EXCEPT, respectively)
Inside the parentheses in an invocation of a “set function” such as SUM, immediately preceding the argument expression
Note: DISTINCT is the default for UNION, INTERSECT, and EXCEPT; ALL is the default in the other cases.
Now, the “set function” case is special; you must specify ALL, at least implicitly, if you want the function to take duplicate values into account, which sometimes you do (see Chapter 7). But the other cases have to do with elimination of duplicate rows, which must always be done, at least in principle, if you want to use SQL relationally. Thus, the obvious recommendations in those cases are: Always specify DISTINCT; preferably do so explicitly; and ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access