Cover | Table of Contents | Colophon
a, b, c = b, c, a
temp = a a = b b = c c = temp
a, b, c = b, c, a
=. What we wrote there—b, c,
a—is indeed an expression. Specifically, it is a
tuple, which is an immutable sequence of three
values. Tuples are often surrounded with parentheses, as in
(b, c, a), but the parentheses
are not necessary, except where the commas would otherwise have some other
meaning (e.g., in a function call). The commas are what create a
tuple, by
packing
the values that are the tuple's items.
= in an assignment
statement, you normally use a single
target.
The target can be a simple identifier (also known as a variable), an
indexing (such as alist[i] or
adict['freep']), an attribute reference (such as
anobject.someattribute), and so on. However,
Python also lets you use several targets (variables, indexings,
etc.), separated by commas, on an assignment's
lefthand side. Such a multiple assignment is also called an
unpacking
assignment. When there are two or more comma-separated targets on the
lefthand side of an assignment, the value of the righthand side must
be a sequence of as many items as there are comma-separated targets
on the lefthand side. Each item of the sequence is assigned to the
corresponding target, in order, from left to right.
a) gets the value of the first item (which used to
be the value of variable b), the second target
(b) gets the value of the second item (which used
to be the value of data = { 'red' : 1, 'green' : 2, 'blue' : 3 }
def makedict(**kwargs):
return kwargs
data = makedict(red=1, green=2, blue=3)
def dodict(*args, **kwds):
d = {}
for k, v in args: d[k] = v
d.update(kwds)
return d
tada = dodict(yellow=2, green=4, *data.items())
=> operator, which is
well suited to building hashes (Perl-speak for dictionaries) from a
literal list:
%data = (red => 1, green => 2, blue => 3);
=> operator in Perl is equivalent to
Perl's own ,, except that it
implicitly quotes the word to its left.
*args or
**kwds (if you want to
use both, the one with ** must be last). If you
have *args, your
function can be called with any number of extra actual arguments of
the positional, or plain, kind. Python collects all the extra
positional arguments into a tuple and binds that tuple to the
identifier args. Similarly, if you have
**kwds, your function
can be called with any number of extra actual arguments of the named,
or keyword, kind. Python collects all the extra named arguments into
a dictionary (with the names as the keys and the values as the
values) and binds that dictionary to the identifier
get
method of dictionaries is for. Say you have a dictionary:
d = {'key':'value'}
'key' from d in an
exception-safe way:
if d.has_key('key'): # or, in Python 2.2 or later: if 'key' in d:
print d['key']
else:
print 'not found'
print d.get('key', 'not found')
get method.
d[x], and the value of x is not
a key in dictionary d, your attempt raises a
KeyError exception. This is often okay. If you
expected the value of x to be a key in
d, an exception is just the right way to inform
you that you're wrong (i.e., that you need to debug
your program).
x may or may not be a key in
d. In this case, don't start
messing with the has_key method or with
try/except statements. Instead,
use the get method. If you call
d.get(x), no exception is thrown: you get
d[x] if x is a key in
d, and if it's not, you get
None (which you can check for or propagate). If
None is not what you want to get when
x is not a key of d, call
d.get(x, somethingelse)
instead. In this case, if x is not a key, you will
get the value of somethingelse.
get is a simple, useful mechanism that is well
explained in the Python documentation, but a surprising number of
people don't know about it. This idiom is also quite
common in Zope, for example, when pulling variables out of the
REQUEST dictionary.
D, you need to use the entry
D[k] if it's already present, or
add a new D[k] if k
isn't yet a key in D.
setdefault
method of dictionary objects is for. Say we're
building a word-to-page numbers index. A key piece of code might be:
theIndex = {}
def addword(word, pagenumber):
if theIndex.has_key(word):
theIndex[word].append(pagenumber)
else:
theIndex[word] = [pagenumber]
def addword(word, pagenumber):
try: theIndex[word].append(pagenumber)
except KeyError: theIndex[word] = [pagenumber]
setdefault simplifies this further:
def addword(word, pagenumber):
theIndex.setdefault(word, []).append(pagenumber)
setdefault method of a dictionary is a handy
shortcut for this task that is especially useful when the new entry
you want to add is mutable. Basically, dict.setdefault(k,
v) is much like dict.get(k, v), except
that if k is not a key in the dictionary, the
setdefault method assigns
dict[k]=v as a side effect, in addition to
returning v. (get would just
return v, without affecting
dict in any way.) Therefore,
setdefault is appropriate any time you have
get-like needs but also want to produce this
specific side effect on the dictionary.
setdefault is particularly useful in a dictionary
with values that are lists, as detailed in Recipe 1.6. The single most typical usage form for
setdefault is:
somedict.setdefault(somekey, []).append(somevalue)
setdefault is normally not very useful
if the values are immutable. If you just want to count words, for
example, something like the following is no use:
theIndex.setdefault(word, 1)
theIndex[word] = 1 + theIndex.get(word, 0)
d1 = {}
d1.setdefault(key, []).append(value)
d2 = {}
d2.setdefault(key, {})[value] = 1
setdefault method of a
dictionary to initialize the entry for a key in the dictionary, if
needed, and in any case to return said entry.
list_of_values = d1[key]
d1 when the last value for a key is removed:
d1[key].remove(value)
def has_key_with_some_values(d, key):
return d.has_key(key) and d[key]
0 or a list, which may be
empty. In most cases, it is easier to use a
function
that always returns a list (maybe an empty one), such as:
def get_values_if_any(d, key):
return d.get(key, [])
if get_values_if_any(d1, somekey): if has_key_with_some_values(d1, somekey):
get_values_if_any is generally handier.
For example, you can use it to check if 'freep' is
among the values for somekey:
if 'freep' in get_values_if_any(d1, somekey):
case,
switch, or select statement.
animals = []
number_of_felines = 0
def deal_with_a_cat( ):
global number_of_felines
print "meow"
animals.append('feline')
number_of_felines += 1
def deal_with_a_dog( ):
print "bark"
animals.append('canine')
def deal_with_a_bear( ):
print "watch out for the *HUG*!"
animals.append('ursine')
tokenDict = {
"cat": deal_with_a_cat,
"dog": deal_with_a_dog,
"bear": deal_with_a_bear,
}
# Simulate, say, some words read from a file
words = ["cat", "bear", "cat", "dog"]
for word in words:
# Look up the function to call for each word, then call it
functionToCall = tokenDict[word]
functionToCall( )
# You could also do it in one step, tokenDict[word]( )
case statement.
self.method1) or other callables. If you use
unbound methods (such as class.method), you need
to pass an appropriate object as the first actual argument when you
do call them. More generally, you can also store tuples, including
both callables and arguments, as the dictionary's
values, with diverse possibilities.
class Bunch:
def _ _init_ _(self, **kwds):
self._ _dict_ _.update(kwds)
Bunch
instance:
point = Bunch(datum=y, squared=y*y, coord=x)
if point.squared > threshold:
point.isok = 1
if point['squared'] > threshold
if bunch.squared > threshold
class EvenSimplerBunch:
def _ _init_ _(self, **kwds): self._ _dict_ _ = kwds
Bunch class has
the advantage of not rebinding self._ _dict_ _ (it
uses the dictionary's update
method to modify it instead), so it will keep working even if, in
some hypothetical far-future dialect of Python, this specific
dictionary became nonrebindable (as long, of course, as it remains
mutable). But this EvenSimplerBunch is indeed even
simpler, and marginally speedier, as it just rebinds the dictionary.
bunch['squared'] and so on. In Python
2.1 or earlier, for example, the simplest way is:
some_dict = { 'zope':'zzz', 'python':'rocks' }
another_dict = { 'python':'rocks', 'perl':'$' }
intersect = []
for item in some_dict.keys( ):
if item in another_dict.keys( ):
intersect.append(item)
print "Intersects:", intersect
intersect = []
for item in some_dict.keys( ):
if another_dict.has_key(item):
intersect.append(item)
print "Intersects:", intersect
print "Intersects:", [k for k in some_dict if k in another_dict]
print "Intersects:", filter(another_dict.has_key, some_dict.keys())
keys
method produces a list of all the keys of a dictionary. It can be
pretty tempting to fall into the trap of just using
in, with this list as the righthand side, to test
for membership. However, in the first example,
you're looping through all of
some_dict, then each time looping through all of
another_dict. If some_dict has
N1 items, and another_dict
has N2 items, your intersection operation will
have a compute time proportional to the product of
N1x
N2.
(O(N1x N2) is the common
computer-science notation to indicate this.)
has_key
method, you are not looping on another_dict any
more, but rather checking the key in the
dictionary's hash table. The processing time for
has_key is basically independent of dictionary
size, so the second approach is
O(N1). The difference is
quite substantial for large dictionaries! If the two dictionaries are
very different in size, it becomes important to use the smaller one
in the role of if((x=foo( )) or while((x=foo(
)) in such other languages).
if x=foo( ):
if and
while statements. Normally this
isn't a problem, as you can just structure your code
around it. For example, this is quite Pythonic:
while 1:
line = file.readline( )
if not line: break
process(line)
for line in file.xreadlines( ):
process(line)
for line in file:
process(line)
class DataHolder:
def _ _init_ _(self, value=None):
self.value = value
def set(self, value):
self.value = value
return value
def get(self):
return self.value
# optional and strongly discouraged, but handy at times:
import _ _builtin_ _
_ _builtin_ _.DataHolder = DataHolder
_ _builtin_ _.data = DataHolder( )
DataHolder class and its
data instance, you can keep your C-like code
structure intact in transliteration:
while data.set(file.readline( )):
process(data.get( ))
if, elif, or
while statement. This is usually okay: you just
structure your code to avoid the need to assign while testing (in
fact, your code will often become clearer as a result). However,
sometimes you may be writing Python code that is the transliteration
of code originally written in C, Perl, or another language that
supports assignment-as-expression. For example, such transliteration
often occurs in the first Python version of an algorithm for which a
reference implementation is supplied, an algorithm taken from a book,
and so on. In such cases, having the structure of your initial
transliteration be close to that of the code you're
transcribing is often preferable. Fortunately, Python offers enough
power to make it pretty trivial to satisfy this requirement.
map and filter because
they can be hard to read and understand, particularly when they need
lambda.
thenewlist = map(lambda x: x + 23, theoldlist)
thenewlist = [x + 23 for x in theoldlist]
thenewlist = filter(lambda x: x > 5, theoldlist)
thenewlist = [x for x in theoldlist if x > 5]
thenewlist = map(lambda x: x+23, filter(lambda x: x>5, theoldlist))
if clause and use some
expression, such as adding 23, on the selected items:
thenewlist = [x + 23 for x in theoldlist if x > 5]
map and
filter functions still have their uses, since
they're arguably of equal elegance and clarity as
list comprehensions when the lambda construct is
not necessary. In fact, when their first argument is another built-in
function (i.e., when lambda is not involved and
there is no need to write a function just for the purpose of using it
within a map or filter), they
can be even faster than list comprehensions.
map and filter calls
than similar programs written for 1.5.2. Most of the
map and filter calls (and quite
a few explicit loops) are replaced with list comprehensions (which
Python borrowed, after some prettying of the syntax, from Haskell,
described at unzip
counterpart to zip, but it's not
hard to code our own:
def unzip(p, n):
""" Split a sequence p into a list of n tuples, repeatedly taking the
next unused element of p and adding it to the next tuple. Each of the
resulting tuples is of the same length; if p%n != 0, the shorter tuples
are padded with None (closer to the behavior of map than to that of zip).
Example:
>>> unzip(['a','b','c','d','e'], 3)
[('a', 'd'), ('b', 'e'), ('c', None)]
"""
# First, find the length for the longest sublist
mlen, lft = divmod(len(p), n)
if lft != 0: mlen += 1
# Then, initialize a list of lists with suitable lengths
lst = [[None]*mlen for i in range(n)]
# Loop over all items of the input sequence (index-wise), and
# Copy a reference to each into the appropriate place
for i in range(len(p)):
j, k = divmod(i, n) # Find sublist-index and index-within-sublist
lst[k][j] = p[i] # Copy a reference appropriately
# Finally, turn each sublist into a tuple, since the unzip function
# is specified to return a list of tuples, not a list of lists
return map(tuple, lst)
zip function (although it deals with only the
very simplest cases). This recipe was useful to me recently when I
had to take a Python list and break it down into a number of
different pieces, putting each consecutive item of the list into a
separate sublist.
None is generally more efficient than building up
each sublist by repeated calls to append. Also, in
this case, it already ensures the padding with
None that we would need anyway (unless
length(p) just happens to be a multiple of
n).
unzip uses is quite simple: a
reference to each item of the input sequence is placed into the
appropriate item of the appropriate sublist. The built-in function
1 if the
element is scalar or 0 otherwise. Given this, one
approach is:
def flatten(sequence, scalarp, result=None):
if result is None: result = []
for item in sequence:
if scalarp(item): result.append(item)
else: flatten(item, scalarp, result)
return result
result list:
from _ _future_ _ import generators
def flatten22(sequence, scalarp):
for item in sequence:
if scalarp(item):
yield item
else:
for subitem in flatten22(item, scalarp):
yield subitem
flatten. Of course, we must be
able to loop over the items of any non-scalar with a
for statement, or flatten will
raise an exception (since it does, via a recursive call, attempt a
for statement over any non-scalar item). In Python
2.2, that's easy to check:
def canLoopOver(maybeIterable):
try: iter(maybeIterable)
except: return 0
else: return 1
iter, new in Python 2.2,
returns an iterator, if possible. for x
in s implicitly calls the iter
function, so the canLoopOver function can easily
check if for is applicable by calling
iter explicitly and seeing if that raises an
exception.
xrange
and zip make this easy. You need only this one
instance of xrange, as it is fully reusable:
indices = xrange(sys.maxint)
indices
instance:
for item, index in zip(sequence, indices):
something(item, index)
for index in range(len(sequence)):
something(sequence[index], index)
class Indexed:
def _ _init_ _(self, seq):
self.seq = seq
def _ _getitem_ _(self, i):
return self.seq[i], i
for item, index in Indexed(sequence):
something(item, index)
from _ _future_ _ import
generators, you can also use:
def Indexed(sequence):
iterator = iter(sequence)
for index in indices:
yield iterator.next( ), index
# Note that we exit by propagating StopIteration when .next raises it!
def Indexed(sequence):
return zip(sequence, indices)
for i in range(len(sequence)):
sequence[i] as the item reference in the
loop's body. However, in many contexts, it is
clearer to emphasize the loop on the sequence items rather than on
the indexes. zip provides an easy alternative,
looping on indexes and items in parallel, since it truncates at the
shortest of its arguments. Thus, it's okay for some
arguments to be unbounded sequences, as long as not all the arguments
are unbounded. An unbounded sequence of indexes is trivial to write
(xrange is handy for this), and a reusable
instance of that sequence can be passed to zip, in
parallel to the sequence being indexed.
zip usage also affords a client
code-transparent alternative to the use of a wrapper class
a = ['a1', 'a2', 'a3'] b = ['b1', 'b2']
map, with a first argument
of None, you can iterate on both lists in
parallel:
print "Map:"
for x, y in map(None, a, b):
print x, y
y will be None.
zip also lets you
iterate in parallel:
print "Zip:"
for x, y in zip(a, b):
print x, y
print "List comprehension:"
for x, y in [(x,y) for x in a for y in b]:
print x, y
b for
each item of a.
map with None as the
first argument is a subtle variation of the standard
map call, which typically takes a function as the
first argument. As the documentation indicates, if the first argument
is None, the identity function is used as the
function through which the arguments are mapped. If there are
multiple list arguments, map returns a list
consisting of tuples that contain the corresponding items from all
lists (in other words, it's a kind of transpose
operation). The list arguments may be any kind of sequence, and the
result is always a list.
None for
sequences in which there are no more elements. Therefore, the output
of the first loop is:
Map: a1 b1 a2 b2 a3 None
zip lets you iterate over
the lists in a similar way, but only up to the number of elements of
the smallest list. Therefore, the output of the second technique is:
Zip: a1 b1 a2 b2
[(x,y) for x in a for y in b]
b for every element in a. These
elements are put into a tuple (x,
y). We then iterate through the resulting list of
tuples in the outermost for loop. The output of
the third technique, therefore, is quite different:
List comprehension: a1 b1 a1 b2 a2 b1 a2 b2 a3 b1 a3 b2
range, but with float values
(range works only on integers).
def frange(start, end=None, inc=1.0):
"A range-like function that does accept float increments..."
if end == None:
end = start + 0.0 # Ensure a float value for 'end'
start = 0.0
assert inc # sanity check
L = []
while 1:
next = start + len(L) * inc
if inc > 0 and next >= end:
break
elif inc < 0 and next <= end:
break
L.append(next)
return L
range, but with float arguments.
append repeatedly. This also
allows you to get rid of the conditionals in the inner loop. For one
element, this version is barely faster, but with more than 10
elements it's consistently about 5 times
faster—the kind of performance ratio that is worth caring
about. I get identical output for every test case I can think of:
def frange2(start, end=None, inc=1.0):
"A faster range-like function that does accept float increments..."
if end == None:
end = start + 0.0
start = 0.0
else: start += 0.0 # force it to be a float
count = int((end - start) / inc)
if start + count * inc != end:
# Need to adjust the count. AFAICT, it always comes up one short.
count += 1
L = [start] * count
for i in xrange(1, count):
L[i] = start + i * inc
return Larr = [[1,2,3], [4,5,6], [7,8,9], [10,11,12]]
print [[r[col] for r in arr] for col in range(len(arr[0]))]
[[1, 4, 7, 10], [2, 5, 8, 11], [3, 6, 9, 12]]
GetRows method actually
appears to return database columns in Python, despite its name. This
recipe's solution to this common problem was chosen
to demonstrate nested list comprehensions.
http://www.pfdubois.com/numpy/).
multilist = [[0 for col in range(5)] for row in range(10)]
>>> [0] * 5
[0, 0, 0, 0, 0]
>>> multi = [[0] * 5] * 3
>>> print multi
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
>>> multi[0][0] = 'Changed!'
>>> print multi
[['Changed!', 0, 0, 0, 0], ['Changed!', 0, 0, 0, 0], ['Changed!', 0, 0, 0, 0]]
http://www.python.org/doc/FAQ.html#4.50). To
understand it, it helps to decompose the creation of the
multidimensional list into two steps:
>>> row = [0] * 5 # a list with five references to 0 >>> multi = [row] * 3 # a list with three references to the row object
row, it doesn't matter whether
references are being duplicated or not, since the referent (the
object being referred to) is immutable. In other words, there is no
difference between an object and a reference to an object if that
object is immutable. In the second line, however, what is created is
a new list containing three references to the contents of the
[row] list, which is a single reference to a list.
Thus, sort method of Python lists.
sort method. This is the single most useful
technique to take from this chapter. It relies on an unusual feature
of Python's built-in comparisons: sequences are
compared lexicographically. Lexicographical order is a generalization
to tuples and lists of the everyday rules used to compare strings
(i.e., alphabetical order). The built-in cmp(s1,
s2), when s1 and
s2 are sequences, is equivalent to this Python
code:
def lexcmp(s1, s2):
# Find leftmost nonequal pair
i = 0
while i < len(s1) and i < len(s2):
outcome = cmp(s1[i], s2[i])
if outcome:
return outcome
i += 1
# All equal, until at least one sequence was exhausted
return cmp(len(s1), len(s2))
sort method of Python lists.
sort method. This is the single most useful
technique to take from this chapter. It relies on an unusual feature
of Python's built-in comparisons: sequences are
compared lexicographically. Lexicographical order is a generalization
to tuples and lists of the everyday rules used to compare strings
(i.e., alphabetical order). The built-in cmp(s1,
s2), when s1 and
s2 are sequences, is equivalent to this Python
code:
def lexcmp(s1, s2):
# Find leftmost nonequal pair
i = 0
while i < len(s1) and i < len(s2):
outcome = cmp(s1[i], s2[i])
if outcome:
return outcome
i += 1
# All equal, until at least one sequence was exhausted
return cmp(len(s1), len(s2))
def sortedDictValues1(adict):
items = adict.items( )
items.sort( )
return [value for key, value in items]
def sortedDictValues2(adict):
keys = adict.keys( )
keys.sort( )
return [adict[key] for key in keys]
map is often
marginally faster than a list comprehension when no
lambda is involved:
def sortedDictValues3(adict):
keys = adict.keys( )
keys.sort( )
return map(adict.get, keys)
adict._ _getitem_ _ rather
than adict.get in this latest, bound-method
version.
xs
and ys are the two data sets, with matching keys
as the first item in each entry, so that x[0] ==
y[0] defines an
"interesting" pair:
auxdict = {}
for y in ys: auxdict.setdefault(y[0], []).append(y)
result = [ process(x, y) for x in xs for y in auxdict[x[0]] ]
cclog is a sequence of records, one
for each credit-card transaction, and that weblog
is a sequence of records describing each web site hit.
Let's further assume that each record uses the
attribute ipaddress to refer to the IP address
involved in each event. In this case, a reasonable first approach
would be to do something like:
results = [ process(webhit, ccinfo) for webhit in weblog for ccinfo in cclog \
if ccinfo.ipaddress==webhit.ipaddress ]
process) needs to happen for only a
small subset of all of the possible combinations of the two variables
(in this case, sort with a
comparison function argument just can't match. For
example, you can ensure the sort's stability, as
follows:
def stable_sorted_copy(alist, _indices=xrange(sys.maxint)):
# Decorate: prepare a suitable auxiliary list
decorated = zip(alist, _indices)
# Sort: do a plain built-in sort on the auxiliary list
decorated.sort( )
# Undecorate: extract the list from the sorted auxiliary list
return [ item for item, index in decorated ]
def stable_sort_inplace(alist):
# To sort in place: assign sorted result to all-list slice of original list
alist[:] = stable_sorted_copy(alist)
sort method is not
guaranteed to be stable: items that compare equal may or may not be
in unchanged order (they often are, but you cannot be sure). Ensuring
stability is easy, as one of the many applications of the common DSU
idiom. For another specific example of DSU usage, see Recipe 2.7.
sort method is slow for
lists of substantial size, but it can still be quite handy when you
need to sort lists that are reasonably small. In particular, it
offers a rather natural idiom to sort by more than one field:
import string
star_list = ['Elizabeth Taylor', 'Bette Davis', 'Hugh Grant', 'C. Grant']
star_list.sort(lambda x,y: (
cmp(string.split(x)[-1], string.split(y)[-1]) or # Sort by last name...
cmp(x, y))) # ...then by first name
print "Sorted list of stars:"
for name in star_list:
print name
cmp
built-in function and the or operator to produce a
compact idiom for sorting a list over more than one field of each
item.
cmp(X, Y) returns false (0)
when X and Y compare equal, so
only in these cases does or let the next call to
cmp happen. To reverse the sorting order, simply
swap X and Y as arguments to
cmp.
def sorting_criterion_1(data):
return string.split(data)[-1] # This is again the last name
def sorting_criterion_2(data):
return len(data) # This is some fancy sorting criterion
# Pack an auxiliary list:
aux_list = map(lambda x: (x,
sorting_criterion_1(x),
sorting_criterion_2(x)),
star_list)
# Sort:
aux_list.sort(lambda x,y: (
cmp(x[1], y[1]) or # Sort by criteria 1 (last name)...
cmp(y[2], x[2]) or # ...then by criteria 2 (in reverse order)...
cmp(x, y))) # ...then by the value in the main list
# Unpack the resulting list:
star_list = map(lambda x: x[0], aux_list)
print "Another sorted list of stars:"
for name in star_list:
print name