Looking for Items in a Sorted Sequence Using Binary Search
Credit: Noah Spurrier, Alex Martelli
Problem
You need to look for a lot of items in a sequence.
Solution
Binary search is a basic algorithm provided by
bisect in Python. How to perform a binary search
to check if a value is present in a list can be summarized in three
lines of code:
thelist.sort( ) item_insert_point = bisect.bisect(thelist, theitem) is_present = thelist[item_insert_point-1:item_insert_point] == [theitem]
Discussion
A friend of mine was searching a large file for missing lines. A
linear search was taking forever, so a binary search was needed. The
name of the bisect module misled him into believing that
Python did not have a binary search algorithm. I created an example
to show how bisect is used to perform a binary
search.
The third line in the recipe may not be immediately obvious. If we
know that thelist is not empty, this is much
simpler and more obvious:
is_present = thelist[item_insert_point-1] == theitem
However, this simpler approach raises an exception for an empty
thelist, since indexing must check for a valid
index (the only way to return an indicator of “no
such index” is by raising
IndexError). Slicing is more relaxed than
indexing, since it results in an empty slice for invalid
slice-boundary indexes. Therefore, in general:
somelist[x:x+1]
yields the same one-item list as:
[somelist[x]]
when x is a valid index in
somelist. However, it results in an empty list
([]) when the indexing would raise an
IndexError ...