Credit: Alex Martelli
You have a sequence that may include duplicates, and you want to remove the duplicates in the fastest possible way. Moreover, the output sequence must respect the item ordering of the input sequence.
The need to respect the item ordering of the input sequence
means that picking unique items becomes a problem quite different from
that explored previously in Recipe 18.1. This
requirement often arises in conjunction with a function
f that defines an equivalence relation
x is equivalent to
y if and only if
this case, the task of removing duplicates may often be better
described as picking the first representative of each resulting
equivalence class. Here is a function to perform this task:
# support 2.3 as well as 2.4 try: set except NameError: from sets import Set as set # f defines an equivalence relation among items of sequence seq, and # f(x) must be hashable for each item x of seq def uniquer(seq, f=None): """ Keeps earliest occurring item of each f-defined equivalence class """ if f is None: # f's default is the identity function f(x) -> x def f(x): return x already_seen = set( ) result = [ ] for item in seq: marker = f(item)if marker not in already_seen: already_seen.add(marker) result.append(item) return result
The previous Recipe 18.1 is applicable only if you are not concerned about item ordering or, in other words, if the ...