De-Duping Data with Ranking Functions
One common problem encountered with imported data is unexpected duplicate data rows, especially if the data is being consolidated from multiple sources. In previous versions of SQL Server, de-duping the data often involved the use of cursors and temp tables. Since the introduction of the ROW_NUMBER
ranking function and common table expressions in SQL Server 2005, you are able to de-dupe data with a single statement.
To demonstrate this approach, Listing 43.26 shows how to create an authors_import
table and populate it with some duplicate rows.
You can see in the data for Listing 43.27 that there are two duplicates for au_id 499-84-5672 ...
Get Microsoft® SQL Server 2008 R2 Unleashed now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.