Identifying and Removing Mismatched or Unattached Rows
Problem
You have two datasets that are related, but possibly imperfectly so. You want to determine whether there are records in either dataset that are “unattached” (not matched by any record in the other dataset), and perhaps remove them if so. This might occur, for example, when you receive data from an external source and must check it to verify its integrity.
Solution
Use a
LEFT
JOIN
to identify unmatched values in each
table. If there are any and you want to get rid of them, use a
multiple-table DELETE
statement.
It’s also possible to identify or remove nonmatching rows by using
NOT
IN
subqueries.
Discussion
Inner joins are useful for identifying relationships, and outer joins are useful for identifying the lack of relationship. This property of outer joins is valuable when you have datasets that are supposed to be related but for which the relationship might be imperfect.
Mismatches between datasets can occur if you receive two datafiles from an external source that are supposed to be related but for which the integrity of the relationship actually is imperfect. It can also occur as an anticipated consequence of a deliberate action. Suppose that an online discussion board uses a parent table that lists discussion topics and a child table that rows the articles posted for each topic. If you purge the child table of old article rows, that may result in any given topic row in the parent table no longer having any children. If ...
Get MySQL Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.