Skip to Content
Regular Expressions Cookbook, 2nd Edition
book

Regular Expressions Cookbook, 2nd Edition

by Jan Goyvaerts, Steven Levithan
August 2012
Intermediate to advanced
609 pages
19h 16m
English
O'Reilly Media, Inc.
Content preview from Regular Expressions Cookbook, 2nd Edition

5.9. Remove Duplicate Lines

Problem

You have a log file, database query output, or some other type of file or string with duplicate lines. You need to remove all but one of each duplicate line using a text editor or other similar tool.

Solution

There is a variety of software (including the Unix command-line utility uniq and Windows PowerShell cmdlet Get-Unique) that can help you remove duplicate lines in a file or string. The following sections contain three regex-based approaches that can be especially helpful when trying to accomplish this task in a nonscriptable text editor with regular expression search-and-replace support.

When you’re programming, options two and three should be avoided since they are inefficient compared to other available approaches, such as using a hash object to keep track of unique lines. However, the first option (which requires that you sort the lines in advance, unless you only want to remove adjacent duplicates) may be an acceptable approach since it’s quick and easy.

Option 1: Sort lines and remove adjacent duplicates

If you’re able to sort lines in the file or string you’re working with so that any duplicate lines appear next to each other, you should do so, unless the order of the lines must be preserved. This option will allow using a simpler and more efficient search-and-replace operation to remove the duplicates than would otherwise be possible.

After sorting the lines, use the following regex and replacement string to get rid of the duplicates:

^(.*)(?:(?:\r?\n|\r)\1)+$ ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Regular Expressions Cookbook

Regular Expressions Cookbook

Jan Goyvaerts, Steven Levithan

Publisher Resources

ISBN: 9781449327453Supplemental ContentErrata Page