Performing Frequent Membership Tests on a Sequence
Credit: Alex Martelli
Problem
You need to perform frequent tests for
membership in a sequence. The
O(N) behavior of repeated
in operators hurts performance, but you
can’t switch to using just a dictionary, as you also
need the sequence’s order.
Solution
Say you need to append items to a list only if they’re not already in the list. The simple, naive solution is excellent but may be slow:
def addUnique1(baseList, otherList):
for item in otherList:
if item not in baseList:
baseList.append(item)If otherList is large, it may be faster to build
an auxiliary dictionary:
def addUnique2(baseList, otherList):
auxDict = {}
for item in baseList:
auxDict[item] = None
for item in otherList:
if not auxDict.has_key(item):
baseList.append(item)
auxDict[item] = NoneFor a list on which you must often perform membership tests, it may
be best to wrap the list, together with its auxiliary dictionary,
into a class. You can then define a special _ _contains_ _ method to
speed the in operator. The dictionary must be
carefully maintained to stay in sync with the sequence.
Here’s a version that does the syncing just in time,
when a membership test is required and the dictionary is out of sync,
and works with Python 2.1 or later:
from _ _future_ _ import nested_scopes import UserList try: list._ _getitem_ _ except: Base = UserList.UserList else: Base = list class FunkyList(Base): def _ _init_ _(self, initlist=None): Base._ _init_ _(self, initlist) self._dict_ok ...