Locking the table against changes for long periods may not be possible while we remove duplicate rows. That creates some fairly hard problems with large tables. In that case, we need to do things slightly differently:
- Identify the rows to be deleted, and save them in a side table.
- Build an index on the main table to speed up access to rows (maybe using the CONCURRENTLY keyword, as explained in the Maintaining indexes recipe in Chapter 9, Regular Maintenance).
- Write a program that reads the rows from the side table in a loop, performing a series of smaller transactions.
- Start a new transaction.
- From the side table, read a set of rows that match.
- Select those rows from the main table for updates, relying on the index to make ...