Eliminating Duplicates from a Self-Join Result
Problem
Self-joins often produce rows that are
“near” duplicates—that is,
rows that contain the same values but in different orders. Because of
this, SELECT DISTINCT will not
eliminate the duplicates.
Solution
Select column values in a specific order within rows to make rows
with duplicate sets of values identical. Then you can use
SELECT DISTINCT to remove
duplicates. Alternatively, retrieve rows in such a way that
near-duplicates are not even selected.
Discussion
Self-joins can produce rows that are duplicates in the sense that they contain the same values, yet are not identical. Consider the following query, which uses a self-join to find all pairs of states that joined the Union in the same year:
mysql>SELECT YEAR(s2.statehood) AS year, s1.name, s2.name->FROM states AS s1, states AS s2->WHERE YEAR(s1.statehood) = YEAR(s2.statehood)->AND s1.name != s2.name->ORDER BY year, s1.name, s2.name;+------+----------------+----------------+ | year | name | name | +------+----------------+----------------+ | 1787 | Delaware | New Jersey | | 1787 | Delaware | Pennsylvania | | 1787 | New Jersey | Delaware | | 1787 | New Jersey | Pennsylvania | | 1787 | Pennsylvania | Delaware | | 1787 | Pennsylvania | New Jersey | ... | 1912 | Arizona | New Mexico | | 1912 | New Mexico | Arizona | | 1959 | Alaska | Hawaii | | 1959 | Hawaii | Alaska | +------+----------------+----------------+
The condition in the WHERE clause that requires state pair names ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access