Credit: Tim Peters, PythonLabs
Algorithm research is what drew me to Python—and I fell in love. It wasn’t love at first sight, but it was an attraction that grew into infatuation, which grew steadily into love. And that love shows no signs of fading. Why? I’ve worked in fields pushing the state of the art, and, in a paradoxical nutshell, Python code is easy to throw away!
When you’re trying to solve a problem that may not have been solved before, you may have some intuitions about how to proceed, but you rarely know in advance exactly what needs to be done. The only way to proceed is to try things, many things, everything you can think of, just to see what happens. Python eases this by minimizing the time and pain from conception to code: if your colleagues are using, for example, C or Java, it’s not unusual for you to try and discard six different approaches in Python while they’re still getting the bugs out of their first attempt.
In addition, you will have naturally grown classes and modules that capture key parts of the problem domain, simply because you find the need to keep reinventing them when starting over from scratch. A true C++ expert can give you a good run, but C++ is so complex that true experts are very rare. Moderate skill with Python is much easier to obtain, yet much more productive for research and prototyping than merely moderate C++ skill.
So if you’re in the research business—and every programmer who doesn’t know everything occasionally is—you’ve got a nearly perfect language in Python. How then do you develop the intuitions that can generate a myriad of plausible approaches to try? Experience is the final answer, as we all get better at what we do often, but studying the myriad approaches other people have tried develops a firm base from which to explore. Toward that end, here are the most inspiring algorithm books I’ve read—they’ll teach you possibilities you may never have discovered on your own:
- Jon Bentley’s Programming Pearls and More Programming Pearls (Addison-Wesley)
Every programmer should read these books from cover to cover for sheer joy. The chapters are extended versions of a popular column Bentley wrote for the Communications of the Association for Computing Machinery (CACM). Each chapter is generally self-contained, covering one or two lovely (and often surprising, in the “Aha! why didn’t I think of that?!” sense) techniques of real practical value.
- Robert Sedgewick’s Algorithms in C++ or Algorithms in C (Addison-Wesley)
These books cover the most important general algorithms, organized by problem domain, and provide brief but cogent explanations, along with working code. The books cover the same material; the difference is in which computer language is used for the code. I recommend the C++ book for Python programmers, because idiomatic Python is closer to C++ than to C, and Sedgewick’s use of C++ is generally simple and easily translated to equivalent Python. This is the first book to reach for if you need to tackle a new area quickly.
- Donald Knuth’s The Art of Computer Programming series (Addison-Wesley)
For experts (and those who aspire to expertise), this massive series in progress is the finest in-depth exposition of the state of the art. Nothing compares to its unique combination of breadth and depth, rigor, and historical perspective. Note that these books aren’t meant to be read, they have to be actively studied, and many valuable insights are scattered in answers to the extensive exercises. While there’s detailed analysis, there’s virtually no working code, except for programs written in assembly language for a hypothetical machine of archaic design (yes, this can be maddeningly obscure). It can be hard going at times, but few books reward time invested so richly.
After consorting with the
algorithm gods, a nasty
practical problem arises back on Earth. When you have two approaches
available, how do you measure which is faster? It turns out this is
hard to do in a platform-independent way (even in Python) when one
approach isn’t obviously much faster than the other.
One of the nastiest problems is that the resolution of timing
facilities varies widely across platforms, and even the meaning of
time varies. Your two primary choices for time measurement in Python
shouldn’t be used for algorithm timing on
Windows, because the
timer updates only 18.2 times per second. Therefore, timing
differences up to about 0.055 seconds are lost to quantization error
(over a span of time briefer than that,
may return exactly the same number at each end). On the other hand,
time.time typically has the best resolution on
time.time measures wall-clock time. So,
for example, it includes time consumed by the operating system when a
burst of network activity demands attention. For this reason (among
others), it’s important to close all nonessential
programs when running delicate timing tests and, if you can, shut
down your network daemons.
is a much better choice on Windows and often on Unix-like systems.
time.clock uses the Win32
facility, and the timer updates more than a million times per second.
This virtually eliminates quantization error but also measures
wall-clock time, so it is still important to close other programs
time.clock has good and bad aspects
on most Unix-like systems. The good side is that it generally
measures user time, an account of how much time the CPU spent in the
process that calls
time.clock, excluding time
consumed by other processes. The bad side is that this timer
typically updates no more than 100 times per second, so a
quantization error can still give misleading results. The best
approach to this is to do many repetitions of the basic thing
you’re timing, so that the time delta you compute is
large compared to the timer’s updating frequency.
You can then divide the time delta by the number of repetitions to
get the average time.
Overall, there’s no compelling best answer here! One useful approach is to start your timing code with a block such as:
if 1: from time import clock as now else: from time import time as now
now in your timing code and run your
timing tests twice, switching the underlying timing function between
runs by changing 1 to 0 (or vice versa).
def add(i, j): i + j def timer(n): start = now( ) for i in range(n): add(1, 2) finish = now( ) # Return average elapsed time per call return (finish - start) / n
Mostly, this program measures the time to call
which should be obvious. What’s less obvious is that
it’s also timing how long it takes to build a list
n integers, including the time Python takes to
allocate memory for each of
n integer objects,
fiddle with each integer object’s reference count,
and free the memory again for each. All of this is more expensive
add’s body does. In
other words, the thing you’re trying to time is lost
in the timing approach’s overhead.
It helps to build the list of timing loop indexes outside the range
of the bracketing
now calls, which
you’ll often see done. It helps even more to build
the list in a different way, reusing the same object
n times. This helps because the reference-count
manipulations hit the same piece of memory each time instead of
leaping all over memory because the
variable is bound and unbound as the
def add(i, j, indices): for k in indices: i + j def timer(n): indices = [None] * n # may be more convenient as a module global start = now( ) add(1, 2, indices) finish = now( ) return (finish - start) / n
i+j on the same line as the
for clause is another subtle trick. Because
they’re on the same line, we avoid measuring time
consumed by the Python
SET_LINENO opcode that the
Python compiler would generate (if run without the
-O switch) if the two pieces of code were on
There’s one more twist I recommend here. No matter
how quiet you try to make your machine, modern operating systems and
modern CPUs are so complex that it’s almost
impossible to get the same result from one run to the next. If you
find that hard to believe, it’s especially valuable
to run the
timer body inside another loop to
accumulate the results from several runs of
def timer(n_per_call, n_calls): indices = [None] * n_per_call results =  for i in range(n_calls): start = now( ) add(1, 2, indices) finish = now( ) results.append((finish - start) / n_per_call) results.sort( ) return results print "microseconds per add:" for t in timer(100000, 10): print "%.3f" % (t * 1e6), print
Here’s output from a typical run on an 866-MHz
Windows 98SE box using
microseconds per add: 0.520 0.549 0.932 0.987 1.037 1.073 1.126 1.133 1.138 1.313
Note that the range between the fastest and slowest computed average times spans a factor of 2.5! If I had run the test only once, I might have gotten any of these values and put too much faith in them.
If you try this, your results should be less frightening. Getting repeatable timings is more difficult under Windows 98SE than under any other operating system I’ve tried, so the wild results above should be viewed as an extreme. More likely (if you’re not running Windows 98), you’ll see a bimodal distribution with most values clustered around the fast end and a few at the slow end. The slowest result is often computed on the first try, because your machine’s caches take extra time to adjust to the new task.
As befits a chapter on algorithms, the recipes here have nothing in
common. Rather, it’s a grab-bag of sundry
interesting techniques, ranging from two-dimensional geometry to
parsing date strings. Let your natural interests guide you. I have a
special fondness for Recipe 17.16:
it’s a near-trivial wrapper around the standard
bisect.insort function. Why is that so cool? On
three occasions I’ve recommended using the same
trick to coworkers in need of a priority queue. Each time, when I
bisect maintains the queue as a
sorted list, they were worried that this would be too inefficient to
bear. The attraction of getting a priority queue with no work on
their part overcame their reluctance, though, and, when I asked a
month later, they were still using it—performance was not a
real problem. So if the previous discussion of timing difficulties
discouraged you, here’s cause for optimism: as noted
innumerable times by innumerable authors, the speed of most of your
code doesn’t matter at all. Find the 10% that
consumes most of the time before worrying about any of it.