Eliminating Duplicates from a Self-Join Result
Problem
Self-joins
often produce rows that are “near”
duplicates—that is, rows that contain the same values but in different
orders. Because of this, SELECT
DISTINCT
will not eliminate the
duplicates.
Solution
Select column values in a specific order within rows to make
rows with duplicate sets of values identical. Then you can use
SELECT
DISTINCT
to remove duplicates.
Alternatively, retrieve rows in such a way that near-duplicates are
not even selected.
Discussion
Self-joins can produce rows that are duplicates in the sense that they contain the same values, yet are not identical. Consider the following statement (originally seen in Comparing a Table to Itself), which uses a self-join to find all pairs of states that joined the Union in the same year:
mysql>SELECT YEAR(s1.statehood) AS year,
->s1.name AS name1, s1.statehood AS statehood1,
->s2.name AS name2, s2.statehood AS statehood2
->FROM states AS s1 INNER JOIN states AS s2
->ON YEAR(s1.statehood) = YEAR(s2.statehood) AND s1.name != s2.name
->ORDER BY year, s1.name, s2.name;
+------+----------------+------------+----------------+------------+ | year | name1 | statehood1 | name2 | statehood2 | +------+----------------+------------+----------------+------------+ | 1787 | Delaware | 1787-12-07 | New Jersey | 1787-12-18 | | 1787 | Delaware | 1787-12-07 | Pennsylvania | 1787-12-12 | | 1787 | New Jersey | 1787-12-18 | Delaware | 1787-12-07 | | 1787 | New Jersey | 1787-12-18 | Pennsylvania ...
Get MySQL Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.