5.9. Remove Duplicate Lines
You have a logfile, database query output, or some other type of file or string with duplicate lines. You need to remove all but one of each duplicate line using a text editor or other similar tool.
There is a variety of software (including the Unix command-line
uniq and Windows PowerShell
Get-Unique) that can help you remove duplicate lines in a file or
string. The following sections contain three regex-based approaches
that can be especially helpful when trying to accomplish this task in
a nonscriptable text editor with regular expression search-and-replace
When programming, options two and three should be avoided since they are inefficient compared to other available approaches, such as using a hash object to keep track of unique lines. However, the first option (which requires that you sort the lines in advance, unless you only want to remove adjacent duplicates) may be an acceptable approach since it’s quick and easy.
Option 1: Sort lines and remove adjacent duplicates
If you’re able to sort lines in the file or string you’re working with so that any duplicate lines appear next to each other, you should do so, unless the order of the lines must be preserved. This option will allow using a simpler and more efficient search-and-replace operation to remove the duplicates than would otherwise be possible.
After sorting the lines, use the following regex and replacement string to get rid of the duplicates: