BUY THIS BOOK

Safari Books Online

What is this?

Looking to Reprint this content?


Python Cookbook
Python Cookbook Edited by  Alex Martelli David Ascher
July 2002
Pages: 606

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Python Shortcuts
Credit: David Ascher, ActiveState, co-author of Learning Python (O'Reilly)
Programming languages are like natural languages. Each has a set of qualities that polyglots generally agree on as characteristics of the language. Russian and French are often admired for their lyricism, while English is more often cited for its precision and dynamism: unlike the Académie-defined French language, the English language routinely grows words to suit its speakers' needs, such as "carjacking," "earwitness," "snail mail," "email," "googlewhacking," and "blogging." In the world of computer languages, Perl is well known for its many degrees of freedom: TMTOWTDI (There's More Than One Way To Do It) is one of the mantras of the Perl programmer. Conciseness is also seen as a strong virtue in the Perl and APL communities. In contrast, as you'll see in many of the discussions of recipes throughout this volume, Python programmers often express their belief in the value of clarity and elegance. As a well-known Perl hacker once said, Python's prettier, but Perl is more fun. I agree with him that Python does have a strong (as in well-defined) aesthetic, while Perl has more of a sense of humor. I still have more fun coding in Python, though.
The reason I bring up these seemingly irrelevant bits at the beginning of this book is that the recipes you see in this first chapter are directly related to Python's aesthetic and social dynamics. In most of the recipes in this chapter, the author presents a single elegant language feature, but one that he feels is underappreciated. Much like I, a proud resident of Vancouver, will go out of my way to show tourists the really neat things about the city, from the parks to the beaches to the mountains, a Python user will seek out friends and colleagues and say, "You gotta see this!" Programming in Python, in my mind, is a shared social pleasure, not all that competitive. There's great pleasure in learning a new feature and appreciating its design, elegance, and judicious use, and there's a twin pleasure in teaching another or another thousand about that feature.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Introduction
Credit: David Ascher, ActiveState, co-author of Learning Python (O'Reilly)
Programming languages are like natural languages. Each has a set of qualities that polyglots generally agree on as characteristics of the language. Russian and French are often admired for their lyricism, while English is more often cited for its precision and dynamism: unlike the Académie-defined French language, the English language routinely grows words to suit its speakers' needs, such as "carjacking," "earwitness," "snail mail," "email," "googlewhacking," and "blogging." In the world of computer languages, Perl is well known for its many degrees of freedom: TMTOWTDI (There's More Than One Way To Do It) is one of the mantras of the Perl programmer. Conciseness is also seen as a strong virtue in the Perl and APL communities. In contrast, as you'll see in many of the discussions of recipes throughout this volume, Python programmers often express their belief in the value of clarity and elegance. As a well-known Perl hacker once said, Python's prettier, but Perl is more fun. I agree with him that Python does have a strong (as in well-defined) aesthetic, while Perl has more of a sense of humor. I still have more fun coding in Python, though.
The reason I bring up these seemingly irrelevant bits at the beginning of this book is that the recipes you see in this first chapter are directly related to Python's aesthetic and social dynamics. In most of the recipes in this chapter, the author presents a single elegant language feature, but one that he feels is underappreciated. Much like I, a proud resident of Vancouver, will go out of my way to show tourists the really neat things about the city, from the parks to the beaches to the mountains, a Python user will seek out friends and colleagues and say, "You gotta see this!" Programming in Python, in my mind, is a shared social pleasure, not all that competitive. There's great pleasure in learning a new feature and appreciating its design, elegance, and judicious use, and there's a twin pleasure in teaching another or another thousand about that feature.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Swapping Values WithoutUsing a Temporary Variable
Credit: Hamish Lawson
You want to swap the values of some variables, but you don't want to use a temporary variable.
Python's automatic tuple packing and unpacking make this a snap:
a, b, c = b, c, a
Most programming languages make you use temporary intermediate variables to swap variable values:
temp = a
a = b
b = c
c = temp
But Python lets you use tuple packing and unpacking to do a direct assignment:
a, b, c = b, c, a
In an assignment, Python requires an expression on the righthand side of the =. What we wrote there—b, c, a—is indeed an expression. Specifically, it is a tuple, which is an immutable sequence of three values. Tuples are often surrounded with parentheses, as in (b, c, a), but the parentheses are not necessary, except where the commas would otherwise have some other meaning (e.g., in a function call). The commas are what create a tuple, by packing the values that are the tuple's items.
On the lefthand side of the = in an assignment statement, you normally use a single target. The target can be a simple identifier (also known as a variable), an indexing (such as alist[i] or adict['freep']), an attribute reference (such as anobject.someattribute), and so on. However, Python also lets you use several targets (variables, indexings, etc.), separated by commas, on an assignment's lefthand side. Such a multiple assignment is also called an unpacking assignment. When there are two or more comma-separated targets on the lefthand side of an assignment, the value of the righthand side must be a sequence of as many items as there are comma-separated targets on the lefthand side. Each item of the sequence is assigned to the corresponding target, in order, from left to right.
In this recipe, we have three comma-separated targets on the lefthand side, so we need a three-item sequence on the righthand side, the three-item tuple that the packing built. The first target (variable a) gets the value of the first item (which used to be the value of variable b), the second target (b) gets the value of the second item (which used to be the value of
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Constructing a Dictionary Without Excessive Quoting
Credit: Brent Burley
You'd like to construct a dictionary without having to quote the keys.
Once you get into the swing of Python, you may find yourself constructing a lot of dictionaries. However, the standard way, also known as a dictionary display, is just a smidgeon more cluttered than you might like, due to the need to quote the keys. For example:
data = { 'red' : 1, 'green' : 2, 'blue' : 3 }
When the keys are identifiers, there's a cleaner way:
def makedict(**kwargs):
    return kwargs
data = makedict(red=1, green=2, blue=3)
You might also choose to forego some simplicity to gain more power. For example:
def dodict(*args, **kwds):
    d = {}
    for k, v in args: d[k] = v
    d.update(kwds)
    return d
tada = dodict(yellow=2, green=4, *data.items())
The syntax for constructing a dictionary can be slightly tedious, due to the amount of quoting required. This recipe presents a technique that avoids having to quote the keys, when they are identifiers that you already know at the time you write the code.
I've often found myself missing Perl's => operator, which is well suited to building hashes (Perl-speak for dictionaries) from a literal list:
%data = (red => 1, green => 2, blue => 3);
The => operator in Perl is equivalent to Perl's own ,, except that it implicitly quotes the word to its left.
Perl's syntax is very similar to Python's function-calling syntax for passing keyword arguments. And the fact that Python collects the keyword arguments into a dictionary turned on a light bulb in my head.
When you declare a function in Python, you may optionally conclude the list of formal arguments with *args or **kwds (if you want to use both, the one with ** must be last). If you have *args, your function can be called with any number of extra actual arguments of the positional, or plain, kind. Python collects all the extra positional arguments into a tuple and binds that tuple to the identifier args. Similarly, if you have **kwds, your function can be called with any number of extra actual arguments of the named, or keyword, kind. Python collects all the extra named arguments into a dictionary (with the names as the keys and the values as the values) and binds that dictionary to the identifier
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Getting a Value from a Dictionary
Credit: Andy McKay
You need to obtain a value from a dictionary, without having to handle an exception if the key you seek is not in the dictionary.
That's what the get method of dictionaries is for. Say you have a dictionary:
d = {'key':'value'}
You can write a test to pull out the value of 'key' from d in an exception-safe way:
if d.has_key('key'):      # or, in Python 2.2 or later: if 'key' in d:
  print d['key']
else:
  print 'not found'
However, there is a much simpler syntax:
print d.get('key', 'not found')
Want to get a value from a dictionary but first make sure that the value exists in the dictionary? Use the simple and useful get method.
If you try to get a value with a syntax such as d[x], and the value of x is not a key in dictionary d, your attempt raises a KeyError exception. This is often okay. If you expected the value of x to be a key in d, an exception is just the right way to inform you that you're wrong (i.e., that you need to debug your program).
However, you often need to be more tentative about it: as far as you know, the value of x may or may not be a key in d. In this case, don't start messing with the has_key method or with try/except statements. Instead, use the get method. If you call d.get(x), no exception is thrown: you get d[x] if x is a key in d, and if it's not, you get None (which you can check for or propagate). If None is not what you want to get when x is not a key of d, call d.get(x, somethingelse) instead. In this case, if x is not a key, you will get the value of somethingelse.
get is a simple, useful mechanism that is well explained in the Python documentation, but a surprising number of people don't know about it. This idiom is also quite common in Zope, for example, when pulling variables out of the REQUEST dictionary.
The Library Reference section on mapping types.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Adding an Entry to a Dictionary
Credit: Alex Martelli
Working with a dictionary D, you need to use the entry D[k] if it's already present, or add a new D[k] if k isn't yet a key in D.
This is what the setdefault method of dictionary objects is for. Say we're building a word-to-page numbers index. A key piece of code might be:
theIndex = {}
def addword(word, pagenumber):
    if theIndex.has_key(word):
        theIndex[word].append(pagenumber)
    else:
        theIndex[word] = [pagenumber]
Good Pythonic instincts suggest substituting this "look before you leap" pattern with an "easier to get permission" pattern (see Recipe 5.4 for a detailed discussion of these phrases):
def addword(word, pagenumber):
    try: theIndex[word].append(pagenumber)
    except KeyError: theIndex[word] = [pagenumber]
This is just a minor simplification, but it satisfies the pattern of "use the entry if it is already present; otherwise, add a new entry." Here's how using setdefault simplifies this further:
def addword(word, pagenumber):
    theIndex.setdefault(word, []).append(pagenumber)
The setdefault method of a dictionary is a handy shortcut for this task that is especially useful when the new entry you want to add is mutable. Basically, dict.setdefault(k, v) is much like dict.get(k, v), except that if k is not a key in the dictionary, the setdefault method assigns dict[k]=v as a side effect, in addition to returning v. (get would just return v, without affecting dict in any way.) Therefore, setdefault is appropriate any time you have get-like needs but also want to produce this specific side effect on the dictionary.
setdefault is particularly useful in a dictionary with values that are lists, as detailed in Recipe 1.6. The single most typical usage form for setdefault is:
somedict.setdefault(somekey, []).append(somevalue)
Note that setdefault is normally not very useful if the values are immutable. If you just want to count words, for example, something like the following is no use:
theIndex.setdefault(word, 1)
In this case, you want:
theIndex[word] = 1 + theIndex.get(word, 0)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Associating Multiple Values with Each Key in a Dictionary
Credit: Michael Chermside
You need a dictionary that maps each key to multiple values.
By nature, a dictionary is a one-to-one mapping, but it's not hard to make it one-to-many—in other words, to make one key map to multiple values. There are two possible approaches, depending on how you want to treat duplications in the set of values for a key. The following approach allows such duplications:
d1 = {}
d1.setdefault(key, []).append(value)
while this approach automatically eliminates duplications:
d2 = {}
d2.setdefault(key, {})[value] = 1
A normal dictionary performs a simple mapping of a key to a value. This recipe shows two easy, efficient ways to achieve a mapping of each key to multiple values. The semantics of the two approaches differ slightly but importantly in how they deal with duplication. Each approach relies on the setdefault method of a dictionary to initialize the entry for a key in the dictionary, if needed, and in any case to return said entry.
Of course, you need to be able to do more than just add values for a key. With the first approach, which allows duplications, here's how to retrieve the list of values for a key:
list_of_values = d1[key]
Here's how to remove one value for a key, if you don't mind leaving empty lists as items of d1 when the last value for a key is removed:
d1[key].remove(value)
Despite the empty lists, it's still easy to test for the existence of a key with at least one value:
def has_key_with_some_values(d, key):
    return d.has_key(key) and d[key]
This returns either 0 or a list, which may be empty. In most cases, it is easier to use a function that always returns a list (maybe an empty one), such as:
def get_values_if_any(d, key):
    return d.get(key, [])
You can use either of these functions in a statement. For example:
if get_values_if_any(d1, somekey):
if has_key_with_some_values(d1, somekey):
However, get_values_if_any is generally handier. For example, you can use it to check if 'freep' is among the values for somekey:
if 'freep' in get_values_if_any(d1, somekey):
This extra handiness comes from
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Dispatching Using a Dictionary
Credit: Dick Wall
You need to execute appropriate pieces of code in correspondence with the value of some control variable—the kind of problem that in some other languages you might approach with a case, switch, or select statement.
Object-oriented programming, thanks to its elegant concept of dispatching, does away with many (but not all) such needs. But dictionaries, and the fact that in Python functions are first-class values (in particular, they can be values in a dictionary), conspire to make the problem quite easy to solve:
animals = []
number_of_felines = 0

def deal_with_a_cat(  ):
    global number_of_felines
    print "meow"
    animals.append('feline')
    number_of_felines += 1

def deal_with_a_dog(  ):
    print "bark"
    animals.append('canine')

def deal_with_a_bear(  ):
    print "watch out for the *HUG*!"
    animals.append('ursine')

tokenDict = {
    "cat": deal_with_a_cat,
    "dog": deal_with_a_dog,
    "bear": deal_with_a_bear,
    }

# Simulate, say, some words read from a file
words = ["cat", "bear", "cat", "dog"]

for word in words:
    # Look up the function to call for each word, then call it
    functionToCall = tokenDict[word]
    functionToCall(  )
    # You could also do it in one step, tokenDict[word](  )
The basic idea behind this recipe is to construct a dictionary with string (or other) keys and with bound methods, functions, or other callables as values. During execution, at each step, use the string keys to select which method or function to execute. This can be used, for example, for simple parsing of tokens from a file through a kind of generalized case statement.
It's embarrassingly simple, but I use this technique often. Instead of functions, you can also use bound methods (such as self.method1) or other callables. If you use unbound methods (such as class.method), you need to pass an appropriate object as the first actual argument when you do call them. More generally, you can also store tuples, including both callables and arguments, as the dictionary's values, with diverse possibilities.
I primarily use this in places where in other languages I might want a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Collecting a Bunch of Named Items
Credit: Alex Martelli
You want to collect a bunch of items together, naming each item of the bunch, and you find dictionary syntax a bit heavyweight for the purpose.
Any (classic) class inherently wraps a dictionary, and we take advantage of this:
class Bunch:
    def _ _init_ _(self, **kwds):
        self._ _dict_ _.update(kwds)
Now, to group a few variables, create a Bunch instance:
point = Bunch(datum=y, squared=y*y, coord=x)
You can access and rebind the named attributes just created, add others, remove some, and so on. For example:
if point.squared > threshold:
    point.isok = 1
Often, we just want to collect a bunch of stuff together, naming each item of the bunch; a dictionary's okay for that, but a small do-nothing class is even handier and is prettier to use.
A dictionary is fine for collecting a few items in which each item has a name (the item's key in the dictionary can be thought of as the item's name, in this context). However, when all names are identifiers, to be used just like variables, the dictionary-access syntax is not maximally clear:
if point['squared'] > threshold
It takes minimal effort to build a little class, as in this recipe, to ease the initialization task and provide elegant attribute-access syntax:
if bunch.squared > threshold
An equally attractive alternative implementation to the one used in the solution is:
class EvenSimplerBunch:
    def _ _init_ _(self, **kwds): self._ _dict_ _ = kwds
The alternative presented in the Bunch class has the advantage of not rebinding self._ _dict_ _ (it uses the dictionary's update method to modify it instead), so it will keep working even if, in some hypothetical far-future dialect of Python, this specific dictionary became nonrebindable (as long, of course, as it remains mutable). But this EvenSimplerBunch is indeed even simpler, and marginally speedier, as it just rebinds the dictionary.
It is not difficult to add special methods to allow attributes to be accessed as bunch['squared'] and so on. In Python 2.1 or earlier, for example, the simplest way is:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Finding the Intersection of Two Dictionaries
Credit: Andy McKay, Chris Perkins, Sami Hangaslammi
Given two dictionaries, you need to find the set of keys that are in both dictionaries.
Dictionaries are a good concrete representation for sets in Python, so operations such as intersections are common. Say you have two dictionaries (but pretend that they each contain thousands of items):
some_dict = { 'zope':'zzz', 'python':'rocks' }
another_dict = { 'python':'rocks', 'perl':'$' }
Here's a bad way to find their intersection that is very slow:
intersect = []
for item in some_dict.keys(  ):
    if item in another_dict.keys(  ):
        intersect.append(item)
print "Intersects:", intersect
And here's a good way that is simple and fast:
intersect = []
for item in some_dict.keys(  ):
    if another_dict.has_key(item):
        intersect.append(item)
print "Intersects:", intersect
In Python 2.2, the following is elegant and even faster:
print "Intersects:", [k for k in some_dict if k in another_dict]
And here's an alternate approach that wins hands down in speed, for Python 1.5.2 and later:
print "Intersects:", filter(another_dict.has_key, some_dict.keys())
The keys method produces a list of all the keys of a dictionary. It can be pretty tempting to fall into the trap of just using in, with this list as the righthand side, to test for membership. However, in the first example, you're looping through all of some_dict, then each time looping through all of another_dict. If some_dict has N1 items, and another_dict has N2 items, your intersection operation will have a compute time proportional to the product of N1x N2. (O(N1x N2) is the common computer-science notation to indicate this.)
By using the has_key method, you are not looping on another_dict any more, but rather checking the key in the dictionary's hash table. The processing time for has_key is basically independent of dictionary size, so the second approach is O(N1). The difference is quite substantial for large dictionaries! If the two dictionaries are very different in size, it becomes important to use the smaller one in the role of
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Assigning and Testing with One Statement
Credit: Alex Martelli
You are transliterating C or Perl code to Python, and, to keep close to the original's structure, you need an expression's result to be both assigned and tested (as in if((x=foo( )) or while((x=foo( )) in such other languages).
In Python, you can't code:
if x=foo(  ):
Assignment is a statement, so it cannot fit into an expression, which is necessary for conditions of if and while statements. Normally this isn't a problem, as you can just structure your code around it. For example, this is quite Pythonic:
while 1:
    line = file.readline(  )
    if not line: break
    process(line)
In modern Python, this is far better, but it's even farther from C-like idioms:
for line in file.xreadlines(  ):
    process(line)
In Python 2.2, you can be even simpler and more elegant:
for line in file:
    process(line)
But sometimes you're transliterating C, Perl, or some other language, and you'd like your transliteration to be structurally close to the original.
One simple utility class makes this easy:
class DataHolder:
    def _ _init_ _(self, value=None):
        self.value = value
    def set(self, value):
        self.value = value
        return value
    def get(self):
        return self.value
# optional and strongly discouraged, but handy at times:
import _ _builtin_ _
_ _builtin_ _.DataHolder = DataHolder
_ _builtin_ _.data = DataHolder(  )
With the help of the DataHolder class and its data instance, you can keep your C-like code structure intact in transliteration:
while data.set(file.readline(  )):
    process(data.get(  ))
In Python, assignment is not an expression. Thus, you cannot assign the result that you are testing in, for example, an if, elif, or while statement. This is usually okay: you just structure your code to avoid the need to assign while testing (in fact, your code will often become clearer as a result). However, sometimes you may be writing Python code that is the transliteration of code originally written in C, Perl, or another language that supports assignment-as-expression. For example, such transliteration often occurs in the first Python version of an algorithm for which a reference implementation is supplied, an algorithm taken from a book, and so on. In such cases, having the structure of your initial transliteration be close to that of the code you're transcribing is often preferable. Fortunately, Python offers enough power to make it pretty trivial to satisfy this requirement.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using List Comprehensions Instead of map and filter
Credit: Luther Blissett
You want to perform an operation on all the elements of a list, but you'd like to avoid using map and filter because they can be hard to read and understand, particularly when they need lambda.
Say you want to create a new list by adding 23 to each item of some other list. In Python 1.5.2, the solution is:
thenewlist = map(lambda x: x + 23, theoldlist)
This is hardly the clearest code. Fortunately, since Python 2.0, we can use a list comprehension instead:
thenewlist = [x + 23 for x in theoldlist]
This is much clearer and more elegant.
Similarly, say you want the new list to comprise all items in the other list that are larger than 5. In Python 1.5.2, the solution is:
thenewlist = filter(lambda x: x > 5, theoldlist)
But in modern Python, we can use the following list comprehension:
thenewlist = [x for x in theoldlist if x > 5]
Now say you want to combine both list operations. In Python 1.5.2, the solution is quite complex:
thenewlist = map(lambda x: x+23, filter(lambda x: x>5, theoldlist))
A list comprehension affords far greater clarity, as we can both perform selection with the if clause and use some expression, such as adding 23, on the selected items:
thenewlist = [x + 23 for x in theoldlist if x > 5]
Elegance and clarity, within a generally pragmatic attitude, are Python's core values. List comprehensions, added in Python 2.0, delightfully display how pragmatism can enhance both clarity and elegance. The built-in map and filter functions still have their uses, since they're arguably of equal elegance and clarity as list comprehensions when the lambda construct is not necessary. In fact, when their first argument is another built-in function (i.e., when lambda is not involved and there is no need to write a function just for the purpose of using it within a map or filter), they can be even faster than list comprehensions.
All in all, Python programs optimally written for 2.0 or later use far fewer map and filter calls than similar programs written for 1.5.2. Most of the map and filter calls (and quite a few explicit loops) are replaced with list comprehensions (which Python borrowed, after some prettying of the syntax, from Haskell, described at
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Unzipping Simple List-Like Objects
Credit: gyro funch
You have a sequence and need to pull it apart into a number of pieces.
There's no built-in unzip counterpart to zip, but it's not hard to code our own:
def unzip(p, n):
    """ Split a sequence p into a list of n tuples, repeatedly taking the
    next unused element of p and adding it to the next tuple.  Each of the
    resulting tuples is of the same length; if p%n != 0, the shorter tuples
    are padded with None (closer to the behavior of map than to that of zip).
        Example:
        >>> unzip(['a','b','c','d','e'], 3)
        [('a', 'd'), ('b', 'e'), ('c', None)]
    """
    # First, find the length for the longest sublist
    mlen, lft = divmod(len(p), n)
    if lft != 0: mlen += 1

    # Then, initialize a list of lists with suitable lengths
    lst = [[None]*mlen for i in range(n)]

    # Loop over all items of the input sequence (index-wise), and
    # Copy a reference to each into the appropriate place
    for i in range(len(p)):
        j, k = divmod(i, n)    # Find sublist-index and index-within-sublist
        lst[k][j] = p[i]       # Copy a reference appropriately

    # Finally, turn each sublist into a tuple, since the unzip function
    # is specified to return a list of tuples, not a list of lists
    return map(tuple, lst)
The function in this recipe takes a list and pulls it apart into a user-defined number of pieces. It acts like a sort of reverse zip function (although it deals with only the very simplest cases). This recipe was useful to me recently when I had to take a Python list and break it down into a number of different pieces, putting each consecutive item of the list into a separate sublist.
Preallocating the result as a list of lists of None is generally more efficient than building up each sublist by repeated calls to append. Also, in this case, it already ensures the padding with None that we would need anyway (unless length(p) just happens to be a multiple of n).
The algorithm that unzip uses is quite simple: a reference to each item of the input sequence is placed into the appropriate item of the appropriate sublist. The built-in function
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Flattening a Nested Sequence
Credit: Luther Blissett
You have a sequence, such as a list, some of whose items may in turn be lists, and so on. You need to flatten it out into a sequence of its scalar items (the leaves, if you think of the nested sequence as a tree).
Of course, we need to be able to tell which of the elements we're handling are to be deemed scalar. For generality, say we're passed as an argument a predicate that defines what is scalar—a function that we can call on any element and that returns 1 if the element is scalar or 0 otherwise. Given this, one approach is:
def flatten(sequence, scalarp, result=None):
    if result is None: result = []
    for item in sequence:
        if scalarp(item): result.append(item)
        else: flatten(item, scalarp, result)
    return result
In Python 2.2, a simple generator is an interesting alternative, and, if all the caller needs to do is loop over the flattened sequence, may save the memory needed for the result list:
from _ _future_ _ import generators
def flatten22(sequence, scalarp):
    for item in sequence:
        if scalarp(item):
            yield item
        else:
            for subitem in flatten22(item, scalarp):
                yield subitem
The only problem with this recipe is that determining what is a scalar is not as obvious as it might seem, which is why I delegated that decision to a callable predicate argument that the caller is supposed to pass to flatten. Of course, we must be able to loop over the items of any non-scalar with a for statement, or flatten will raise an exception (since it does, via a recursive call, attempt a for statement over any non-scalar item). In Python 2.2, that's easy to check:
def canLoopOver(maybeIterable):
    try: iter(maybeIterable)
    except: return 0
    else: return 1
The built-in function iter, new in Python 2.2, returns an iterator, if possible. for x in s implicitly calls the iter function, so the canLoopOver function can easily check if for is applicable by calling iter explicitly and seeing if that raises an exception.
In Python 2.1 and earlier, there is no
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Looping in Parallel over Index and Sequence Items
Credit: Alex Martelli
You need to loop on a sequence, but at each step you also need to know what index into the sequence you have reached.
Together, the built-in functions xrange and zip make this easy. You need only this one instance of xrange, as it is fully reusable:
indices = xrange(sys.maxint)
Here's how you use the indices instance:
for item, index in zip(sequence, indices):
    something(item, index)
This gives the same semantics as:
for index in range(len(sequence)):
    something(sequence[index], index)
but the change of emphasis allows greater clarity in many usage contexts.
Another alternative is to use class wrappers:
class Indexed:
    def _ _init_ _(self, seq):
        self.seq = seq
    def _ _getitem_ _(self, i):
        return self.seq[i], i
For example:
for item, index in Indexed(sequence):
    something(item, index)
In Python 2.2, with from _ _future_ _ import generators, you can also use:
def Indexed(sequence):
    iterator = iter(sequence)
    for index in indices:
        yield iterator.next(  ), index
    # Note that we exit by propagating StopIteration when .next raises it!
However, the simplest roughly equivalent way remains the good old:
def Indexed(sequence):
    return zip(sequence, indices)
We often want to loop on a sequence but also need the current index in the loop body. The canonical Pydiom for this is:
for i in range(len(sequence)):
using sequence[i] as the item reference in the loop's body. However, in many contexts, it is clearer to emphasize the loop on the sequence items rather than on the indexes. zip provides an easy alternative, looping on indexes and items in parallel, since it truncates at the shortest of its arguments. Thus, it's okay for some arguments to be unbounded sequences, as long as not all the arguments are unbounded. An unbounded sequence of indexes is trivial to write (xrange is handy for this), and a reusable instance of that sequence can be passed to zip, in parallel to the sequence being indexed.
The same zip usage also affords a client code-transparent alternative to the use of a wrapper class
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Looping Through Multiple Lists
Credit: Andy McKay
You need to loop through every item of multiple lists.
There are basically three approaches. Say you have:
a = ['a1', 'a2', 'a3']
b = ['b1', 'b2']
Using the built-in function map, with a first argument of None, you can iterate on both lists in parallel:
print "Map:"
for x, y in map(None, a, b):
    print x, y
The loop runs three times. On the last iteration, y will be None.
Using the built-in function zip also lets you iterate in parallel:
print "Zip:"
for x, y in zip(a, b):
    print x, y
The loop runs two times; the third iteration simply is not done.
A list comprehension affords a very different iteration:
print "List comprehension:"
for x, y in [(x,y) for x in a for y in b]:
    print x, y
The loop runs six times, over each item of b for each item of a.
Using map with None as the first argument is a subtle variation of the standard map call, which typically takes a function as the first argument. As the documentation indicates, if the first argument is None, the identity function is used as the function through which the arguments are mapped. If there are multiple list arguments, map returns a list consisting of tuples that contain the corresponding items from all lists (in other words, it's a kind of transpose operation). The list arguments may be any kind of sequence, and the result is always a list.
Note that the first technique returns None for sequences in which there are no more elements. Therefore, the output of the first loop is:
Map:
a1 b1
a2 b2
a3 None
zip lets you iterate over the lists in a similar way, but only up to the number of elements of the smallest list. Therefore, the output of the second technique is:
Zip:
a1 b1
a2 b2
Python 2.0 introduced list comprehensions, with a syntax that some found a bit strange:
[(x,y) for x in a for y in b]
This iterates over list b for every element in a. These elements are put into a tuple (x, y). We then iterate through the resulting list of tuples in the outermost for loop. The output of the third technique, therefore, is quite different:
List comprehension:
a1 b1
a1 b2
a2 b1
a2 b2
a3 b1
a3 b2
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Spanning a Range Defined by Floats
Credit: Dinu C. Gherman, Paul M. Winkler
You need an arithmetic progression, just like the built-in function range, but with float values (range works only on integers).
Although this functionality is not available as a built-in, it's not hard to code it with a loop:
def frange(start, end=None, inc=1.0):
    "A range-like function that does accept float increments..."

    if end == None:
        end = start + 0.0     # Ensure a float value for 'end'
        start = 0.0
    assert inc                # sanity check

    L = []
    while 1:
        next = start + len(L) * inc
        if inc > 0 and next >= end:
            break
        elif inc < 0 and next <= end:
            break
        L.append(next)

    return L
Sadly missing in the Python standard library, the function in this recipe lets you use ranges, just as with the built-in function range, but with float arguments.
Many theoretical restrictions apply, but this function is more useful in practice than in theory. People who work with floating-point numbers all the time have many war stories about billion-dollar projects that failed because someone did not take into consideration the strange things that modern hardware does when comparing floating-point numbers. But for pedestrian cases, simple approaches like this recipe generally work.
You can get a substantial speed boost by preallocating the list instead of calling append repeatedly. This also allows you to get rid of the conditionals in the inner loop. For one element, this version is barely faster, but with more than 10 elements it's consistently about 5 times faster—the kind of performance ratio that is worth caring about. I get identical output for every test case I can think of:
def frange2(start, end=None, inc=1.0):
    "A faster range-like function that does accept float increments..."
    if end == None:
        end = start + 0.0
        start = 0.0
    else: start += 0.0 # force it to be a float

    count = int((end - start) / inc)
    if start + count * inc != end:
        # Need to adjust the count. AFAICT, it always comes up one short.
        count += 1

    L = [start] * count
    for i in xrange(1, count):
        L[i] = start + i * inc

    return L
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Transposing Two-Dimensional Arrays
Credit: Steve Holden
You need to transpose a list of lists, turning rows into columns and vice versa.
You must start with a list whose items are lists all of the same length:
arr = [[1,2,3], [4,5,6], [7,8,9], [10,11,12]]
A list comprehension offers a simple, handy way to transpose it:
print [[r[col] for r in arr] for col in range(len(arr[0]))]
[[1, 4, 7, 10], [2, 5, 8, 11], [3, 6, 9, 12]]
This recipe shows a concise way (although not necessarily the fastest way) to turn rows into columns. List comprehensions are known for being concise.
Sometimes data just comes at you the wrong way. For instance, if you use Microsoft's ADO database interface, due to array element ordering differences between Python and Microsoft's preferred implementation language (Visual Basic), the GetRows method actually appears to return database columns in Python, despite its name. This recipe's solution to this common problem was chosen to demonstrate nested list comprehensions.
Notice that the inner comprehension varies what is selected from (the row), while the outer comprehension varies the selector (the column). This process achieves the required transposition.
If you're transposing large arrays of numbers, consider Numeric Python and other third-party packages. Numeric Python defines transposition and other axis-swinging routines that will make your head spin.
The Reference Manual section on list displays (the other name for list comprehensions); Numeric Python (http://www.pfdubois.com/numpy/).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Creating Lists of Lists Without Sharing References
Credit: David Ascher
You want to create a multidimensional list, but the apparently simplest solution is fraught with surprises.
Use list comprehensions (also known as list displays) to avoid implicit reference sharing:
multilist = [[0 for col in range(5)] for row in range(10)]
When a newcomer to Python is shown the power of the multiplication operation on lists, he often gets quite excited about it, since it is such an elegant notation. For example:
>>> [0] * 5
[0, 0, 0, 0, 0]
The problem is that one-dimensional problems often grow a second dimension, so there is a natural progression to:
>>> multi = [[0] * 5] * 3
>>> print multi
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
This appears to have worked, but the same newcomer is then often puzzled by bugs, which typically can be boiled down to the following test:
>>> multi[0][0] = 'Changed!'
>>> print multi
[['Changed!', 0, 0, 0, 0], ['Changed!', 0, 0, 0, 0], ['Changed!', 0, 0, 0, 0]]
This problem definitely confuses most programmers at least once, if not a few times (see the FAQ entry at http://www.python.org/doc/FAQ.html#4.50). To understand it, it helps to decompose the creation of the multidimensional list into two steps:
>>> row = [0] * 5          # a list with five references to 0
>>> multi = [row] * 3      # a list with three references to the row object
The problem still exists in this version (Python is not that magical). The comments are key to understanding the source of the confusion. The process of multiplying a sequence by a number creates a new sequence with the specified number of new references to the original contents. In the case of the creation of row, it doesn't matter whether references are being duplicated or not, since the referent (the object being referred to) is immutable. In other words, there is no difference between an object and a reference to an object if that object is immutable. In the second line, however, what is created is a new list containing three references to the contents of the [row] list, which is a single reference to a list. Thus,
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Searching and Sorting
Credit: Tim Peters, PythonLabs
Computer manufacturers of the 1960s estimated that more than 25 percent of the running time on their computers was spent on sorting, when all their customers were taken into account. In fact, there were many installations in which the task of sorting was responsible for more than half of the computing time. From these statistics we may conclude that either (i) there are many important applications of sorting, or (ii) many people sort when they shouldn't, or (iii) inefficient sorting algorithms have been in common use.
—Donald Knuth, "The Art of Computer Programming", Volume 3, Sorting and Searching, page 3
Professor Knuth's masterful work on the topics of sorting and searching spans nearly 800 pages of sophisticated technical text. In Python practice, we reduce it to two imperatives (we read Knuth so you don't have to):
  • When you need to sort, find a way to use the built-in sort method of Python lists.
  • When you need to search, find a way to use built-in dictionaries.
Many recipes in this chapter illustrate these principles. The most common theme is using the decorate-sort-undecorate (DSU) pattern, a general approach to transforming a sorting problem by creating an auxiliary that we can then sort with the default, speedy sort method. This is the single most useful technique to take from this chapter. It relies on an unusual feature of Python's built-in comparisons: sequences are compared lexicographically. Lexicographical order is a generalization to tuples and lists of the everyday rules used to compare strings (i.e., alphabetical order). The built-in cmp(s1, s2), when s1 and s2 are sequences, is equivalent to this Python code:
def lexcmp(s1, s2):
     # Find leftmost nonequal pair
     i = 0
     while i < len(s1) and i < len(s2):
         outcome = cmp(s1[i], s2[i])
         if outcome:
             return outcome
         i += 1
     # All equal, until at least one sequence was exhausted
     return cmp(len(s1), len(s2))
This code looks for the first nonequal corresponding elements. If such a nonequal pair is found, that pair determines the outcome. Otherwise, if one sequence is a proper prefix of the other, the prefix is considered to be the smaller sequence. Finally, if these cases don't apply, the sequences are identical and are considered equal. Here are some examples:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Introduction
Credit: Tim Peters, PythonLabs
Computer manufacturers of the 1960s estimated that more than 25 percent of the running time on their computers was spent on sorting, when all their customers were taken into account. In fact, there were many installations in which the task of sorting was responsible for more than half of the computing time. From these statistics we may conclude that either (i) there are many important applications of sorting, or (ii) many people sort when they shouldn't, or (iii) inefficient sorting algorithms have been in common use.
—Donald Knuth, "The Art of Computer Programming", Volume 3, Sorting and Searching, page 3
Professor Knuth's masterful work on the topics of sorting and searching spans nearly 800 pages of sophisticated technical text. In Python practice, we reduce it to two imperatives (we read Knuth so you don't have to):
  • When you need to sort, find a way to use the built-in sort method of Python lists.
  • When you need to search, find a way to use built-in dictionaries.
Many recipes in this chapter illustrate these principles. The most common theme is using the decorate-sort-undecorate (DSU) pattern, a general approach to transforming a sorting problem by creating an auxiliary that we can then sort with the default, speedy sort method. This is the single most useful technique to take from this chapter. It relies on an unusual feature of Python's built-in comparisons: sequences are compared lexicographically. Lexicographical order is a generalization to tuples and lists of the everyday rules used to compare strings (i.e., alphabetical order). The built-in cmp(s1, s2), when s1 and s2 are sequences, is equivalent to this Python code:
def lexcmp(s1, s2):
     # Find leftmost nonequal pair
     i = 0
     while i < len(s1) and i < len(s2):
         outcome = cmp(s1[i], s2[i])
         if outcome:
             return outcome
         i += 1
     # All equal, until at least one sequence was exhausted
     return cmp(len(s1), len(s2))
This code looks for the first nonequal corresponding elements. If such a nonequal pair is found, that pair determines the outcome. Otherwise, if one sequence is a proper prefix of the other, the prefix is considered to be the smaller sequence. Finally, if these cases don't apply, the sequences are identical and are considered equal. Here are some examples:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Sorting a Dictionary
Credit: Alex Martelli, Raymond Hettinger
You want to sort a dictionary. Because sorting is a concept that only makes sense for sequences, this presumably means that you want a sequence of the values of the dictionary in the order obtained by sorting the keys.
The simplest approach is to sort the items (i.e., key/value pairs), then pick just the values:
def sortedDictValues1(adict):
    items = adict.items(  )
    items.sort(  )
    return [value for key, value in items]
However, an alternative implementation that sorts just the keys, then uses them to index into the dictionary to build the result, happens to run more than twice as fast (for a dictionary of a few thousand entries) on my system:
def sortedDictValues2(adict):
    keys = adict.keys(  )
    keys.sort(  )
    return [adict[key] for key in keys]
A further small speed-up (15% on my system) is to perform the last step by mapping a bound method. map is often marginally faster than a list comprehension when no lambda is involved:
def sortedDictValues3(adict):
    keys = adict.keys(  )
    keys.sort(  )
    return map(adict.get, keys)
A really tiny extra speed-up (about 3% on my system) is available in Python 2.2 by using adict._ _getitem_ _ rather than adict.get in this latest, bound-method version.
The concept of sorting applies only to a collection that has order—in other words, a sequence. A mapping, such as a dictionary, has no order, so it cannot be sorted. And yet, "How do I sort a dictionary?" is a frequent question on the Python lists. More often than not, the question is about sorting some sequence of keys and/or values from the dictionary.
A dictionary's keys can be extracted as a list, which can then be sorted. The functions in this recipe return the values in order of sorted keys, which corresponds to the most frequent actual need when it comes to sorting a dictionary. Another frequent need is sorting by the values in the dictionary, for which you should see Recipe 17.7.
The implementation choices are interesting. Because we are sorting key/value pairs by the key field and returning only the list of value fields, it seems conceptually simplest to use the first solution, which gets a list of the key/value pairs, sorts them, and then uses a list comprehension to pick the values. However, this is not the fastest solution. Instead, with Python 2.2, on dictionaries of a few thousand items, extracting just the keys, sorting them, and then accessing the dictionary for each key in the resulting list comprehension—the second solution—appears to be over twice as fast.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Processing Selected Pairs of Structured Data Efficiently
Credit: Alex Martelli, David Ascher
You need to efficiently process pairs of data from two large and related data sets.
Use an auxiliary dictionary to do preprocessing of the data, thereby reducing the need for iteration over mostly irrelevant data. For instance, if xs and ys are the two data sets, with matching keys as the first item in each entry, so that x[0] == y[0] defines an "interesting" pair:
auxdict = {}
for y in ys: auxdict.setdefault(y[0], []).append(y)
result = [ process(x, y) for x in xs for y in auxdict[x[0]] ]
To make the problem more concrete, let's look at an example. Say you need to analyze data about visitors to a web site who have purchased something online. This means you need to perform some computation based on data from two log files—one from the web server and one from the credit-card processing framework. Each log file is huge, but only a small number of the web server log entries correspond to credit-card log entries. Let's assume that cclog is a sequence of records, one for each credit-card transaction, and that weblog is a sequence of records describing each web site hit. Let's further assume that each record uses the attribute ipaddress to refer to the IP address involved in each event. In this case, a reasonable first approach would be to do something like:
results = [ process(webhit, ccinfo) for webhit in weblog for ccinfo in cclog \
            if ccinfo.ipaddress==webhit.ipaddress ]
The problem with this approach is that the nested list comprehension will iterate over each entry in the web server log, and, for each entry in the web server log, it will also iterate over each entry in the credit-card log. This means that the algorithm has O(M x N) performance characteristics—in other words, the time it takes to compute will be proportional to the product of the size of both logs. As the web site becomes more popular and as data accumulates, the performance of the algorithm will rapidly deteriorate.
The key to optimizing this algorithm is to recognize that the computation (process) needs to happen for only a small subset of all of the possible combinations of the two variables (in this case,
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Sorting While Guaranteeing Sort Stability
Credit: Alex Martelli, David Goodger
You need to sort a Python list in a guaranteed-stable way (i.e., with no alteration in the relative ordering of items that compare equal).
In addition to speed, decorate-sort-undecorate (DSU) offers flexibility that sort with a comparison function argument just can't match. For example, you can ensure the sort's stability, as follows:
def stable_sorted_copy(alist, _indices=xrange(sys.maxint)):
    # Decorate: prepare a suitable auxiliary list
    decorated = zip(alist, _indices)

    # Sort: do a plain built-in sort on the auxiliary list
    decorated.sort(  )

    # Undecorate: extract the list from the sorted auxiliary list
    return [ item for item, index in decorated ]

def stable_sort_inplace(alist):
    # To sort in place: assign sorted result to all-list slice of original list
    alist[:] = stable_sorted_copy(alist)
The notion of a stable sort is typically not relevant if the sequences you are sorting contain objects that are uniform in type and are simple, such as a list of integers. In such cases, objects that compare equal are equal in all measurable ways. However, in cases where equality for the sake of ordering doesn't correspond to "deep" equality—such as tuples containing floating-point and integer numbers, or, more commonly, rich objects with arbitrary internal structure—it may matter that the elements that start off earlier in the list than their "equal" counterparts remain earlier after the sort.
Python lists' sort method is not guaranteed to be stable: items that compare equal may or may not be in unchanged order (they often are, but you cannot be sure). Ensuring stability is easy, as one of the many applications of the common DSU idiom. For another specific example of DSU usage, see Recipe 2.7.
First, you build an auxiliary list (the decorate step), where each item is a tuple made up of all sort keys, in descending order of significance, of the corresponding item of the input sequence. You must include in each tuple of the auxiliary list all of the information in the corresponding item, and/or an index to it, so that you can fetch the item back, or reconstruct it, in the third step.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Sorting by One Field, Then by Another
Credit: José Sebrosa
You need to sort a list by more than one field of each item.
Passing a comparison function to a list's sort method is slow for lists of substantial size, but it can still be quite handy when you need to sort lists that are reasonably small. In particular, it offers a rather natural idiom to sort by more than one field:
import string

star_list = ['Elizabeth Taylor', 'Bette Davis', 'Hugh Grant', 'C. Grant']

star_list.sort(lambda x,y: (
   cmp(string.split(x)[-1], string.split(y)[-1]) or  # Sort by last name...
   cmp(x, y)))                                       # ...then by first name

print "Sorted list of stars:"
for name in star_list:
    print name
This recipe uses the properties of the cmp built-in function and the or operator to produce a compact idiom for sorting a list over more than one field of each item.
cmp(X, Y) returns false (0) when X and Y compare equal, so only in these cases does or let the next call to cmp happen. To reverse the sorting order, simply swap X and Y as arguments to cmp.
The fundamental idea of this recipe can also be used if another sorting criterion is associated with the elements of the list. We simply build an auxiliary list of tuples to pack the sorting criterion together with the main elements, then sort and unpack the result. This is more akin to the DSU idiom:
def sorting_criterion_1(data):
    return string.split(data)[-1]   # This is again the last name

def sorting_criterion_2(data):
    return len(data)                # This is some fancy sorting criterion

# Pack an auxiliary list:
aux_list = map(lambda x: (x,
                          sorting_criterion_1(x),
                          sorting_criterion_2(x)),
               star_list)

# Sort:
aux_list.sort(lambda x,y: (
   cmp(x[1], y[1])  or       # Sort by criteria 1 (last name)...
   cmp(y[2], x[2])  or       # ...then by criteria 2 (in reverse order)...
   cmp(x, y)))               # ...then by the value in the main list

# Unpack the resulting list:
star_list = map(lambda x: x[0], aux_list)

print "Another sorted list of stars:"
for name in star_list:
    print name
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Looking for Items in a Sorted Sequence Using Binary Search
Content preview·