Removing Duplicate Lines
Problem
After selecting and/or sorting some data you notice that there are many duplicate lines in your results. You’d like to get rid of the duplicates, so that you can see just the unique values.
Solution
You have two choices available to you. If you’ve just been sorting
your output, add the -u option to the
sort command:
$somesequence | sort -uIf you aren’t running sort, just pipe the output into uniq—provided, that is, that the output is sorted, so that identical lines are adjacent:
$somesequence > myfile
$ uniq myfileDiscussion
Since uniq requires the data to be sorted
already, we’re more likely to just add the -u option to sort unless
we also need to count the number of duplicates (-c, see Sorting Numbers),
or see only the duplicates (-d),
which uniq can do.
Warning
Don’t accidentally overwrite a valuable file by mistake; the uniq command is a bit odd in its parameters. Whereas most Unix/Linux commands take multiple input files on the command line, uniq does not. In fact, the first (non-option) argument is taken to be the (one and only) input file and any second argument, if supplied, is taken as the output file. So if you supply two filenames on the command line, the second one will get clobbered without warning.
See Also
man sort
man uniq
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access