Eliminating Duplicates by Adding an Index
Problem
A table has duplicates and you’d like to get rid of them.
Solution
One way to do this is to create a unique index on the column or columns containing duplicates.
Discussion
If MySQL discovers duplicate key values when you try to create a
PRIMARY KEY or a
UNIQUE index, it aborts the
ALTER TABLE operation. To
ignore the duplicates and proceed anyway, use
ALTER IGNORE
TABLE rather than ALTER
TABLE. The
IGNORE keyword tells MySQL to retain the first
row containing a duplicated key value and discard the others. This
is, in fact, a useful way to eliminate duplicates in a column or set
of columns: just create a unique-valued index and let MySQL throw
away the duplicates. (If you need to identify which key values are
duplicated, though, that’s not a suitable technique.
See Recipe 14.4 for information on duplicate
identification.)
To see how IGNORE works to eliminate duplicates,
use mytbl, which now has no indexes if
you’ve issued the index-modification statements
shown earlier. First, insert some duplicate values into the table:
mysql>INSERT INTO mytbl (i,c) VALUES(1,'a'),(1,'a'),(1,NULL),(1,NULL),->(2,'a'),(2,'a'),(2,'b'),(2,'b');mysql>SELECT * FROM mytbl;+---+------+ | i | c | +---+------+ | 1 | a | | 1 | a | | 1 | NULL | | 1 | NULL | | 2 | a | | 2 | a | | 2 | b | | 2 | b | +---+------+
Now suppose you want to create a unique-valued index comprising the
i and c columns. A
PRIMARY KEY cannot be used
here, because c contains NULL values. ...