Chapter 16. Handling Duplicates

Introduction

Tables or result sets sometimes contain duplicate rows. In some cases this is acceptable. For example, if you conduct a web poll that records date and client IP number along with the votes, duplicate rows may be permitted because it’s possible for large numbers of votes to appear to originate from the same IP number for an Internet service that routes traffic from its customers through a single proxy host. In other cases, duplicates are unacceptable, and you’ll want to take steps to avoid them. Operations involved in handling duplicate rows include the following:

  • Preventing duplicates from being created in the first place. If each row in a table is intended to represent a single entity (such as a person, an item in a catalog, or a specific observation in an experiment), the occurrence of duplicates presents significant difficulties in using it that way. Duplicates make it impossible to refer to each row unambiguously, so it’s best to make sure duplicates never occur.

  • Counting the number of duplicates to determine whether they are present and to what extent.

  • Identifying duplicated values (or the rows containing them) so you can see where they occur.

  • Eliminating duplicates to ensure that each row is unique. This may involve removing rows from a table to leave only unique rows or selecting a result set in such a way that no duplicates appear in the output. For example, to display a list of the states in which you have customers, you probably ...

Get MySQL Cookbook, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.