With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

Eliminating Duplicates from a Self-Join Result

Problem

Self-joins often produce rows that are “near” duplicates—that is, rows that contain the same values but in different orders. Because of this, `SELECT` `DISTINCT` will not eliminate the duplicates.

Solution

Select column values in a specific order within rows to make rows with duplicate sets of values identical. Then you can use `SELECT` `DISTINCT` to remove duplicates. Alternatively, retrieve rows in such a way that near-duplicates are not even selected.

Discussion

Self-joins can produce rows that are duplicates in the sense that they contain the same values, yet are not identical. Consider the following query, which uses a self-join to find all pairs of states that joined the Union in the same year:

```mysql> `SELECT YEAR(s2.statehood) AS year, s1.name, s2.name`
-> `FROM states AS s1, states AS s2`
-> `WHERE YEAR(s1.statehood) = YEAR(s2.statehood)`
-> `AND s1.name != s2.name`
-> `ORDER BY year, s1.name, s2.name;`
+------+----------------+----------------+
| year | name           | name           |
+------+----------------+----------------+
| 1787 | Delaware       | New Jersey     |
| 1787 | Delaware       | Pennsylvania   |
| 1787 | New Jersey     | Delaware       |
| 1787 | New Jersey     | Pennsylvania   |
| 1787 | Pennsylvania   | Delaware       |
| 1787 | Pennsylvania   | New Jersey     |
...
| 1912 | Arizona        | New Mexico     |
| 1912 | New Mexico     | Arizona        |
| 1959 | Alaska         | Hawaii         |
| 1959 | Hawaii         | Alaska         |
+------+----------------+----------------+```

The condition in the `WHERE` clause that requires state pair names ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

No credit card required