Chapter 4. Python Shortcuts
Introduction
Credit: David Ascher, ActiveState, co-author of Learning Python
Programming languages are like natural languages. Each has a set of qualities that polyglots generally agree on as characteristics of the language. Russian and French are often admired for their lyricism, while English is more often cited for its precision and dynamism: unlike the Académie-defined French language, the English language routinely grows words to suit its speakers’ needs, such as “carjacking,” “earwitness,” “snailmail,” “email,” “googlewhacking,” and “blogging.” In the world of computer languages, Perl is well known for its many degrees of freedom: TMTOWTDI (There’s More Than One Way To Do It) is one of the mantras of the Perl programmer. Conciseness is also seen as a strong virtue in the Perl and APL communities. As you’ll see in many of the discussions of recipes throughout this volume, in contrast, Python programmers often express their belief in the value of clarity and elegance. As a well-known Perl hacker once told me, Python’s prettier, but Perl is more fun. I agree with him that Python does have a strong (as in well-defined) aesthetic, while Perl has more of a sense of humor.
The reason I mention these seemingly irrelevant characteristics at the beginning of this chapter is that the recipes you see in this chapter are directly related to Python’s aesthetic and social dynamics. If this book had been about Perl, the recipes in a shortcuts chapter would probably elicit head scratching, contemplation, an “a-ha”! moment, and then a burst of laughter, as the reader grokked the genius behind a particular trick. In contrast, in most of the recipes in this chapter, the author presents a single elegant language feature, but one that he feels is underappreciated. Much like I, a proud resident of Vancouver, will go out of my way to show tourists the really neat things about the city, from the parks to the beaches to the mountains, a Python user will seek out friends and colleagues and say, “You gotta see this!” For me and most of the programmers I know, programming in Python is a shared social pleasure, not a competitive pursuit. There is great pleasure in learning a new feature and appreciating its design, elegance, and judicious use, and there’s a twin pleasure in teaching another or another thousand about that feature.
A word about the history of the chapter: back when we identified the recipe categories for the first edition of this collection, our driving notion was that there would be recipes of various kinds, each with a specific goal—a soufflé, a tart, an osso buco. Those recipes would naturally fall into fairly typical categories, such as desserts, appetizers, and meat dishes, or their perhaps less appetizing, nonmetaphorical equivalents, such as files, algorithms, and so on. So we picked a list of categories, added the categories to the Zope site used to collect recipes, and opened the floodgates.
Soon, it became clear that some submissions were hard to fit into the predetermined categories. There’s a reason for that, and cooking helps explain why. The recipes in this chapter are the Pythonic equivalent of making a roux (a cooked mixture of fat and flour, used in making sauces, for those of you without a classic French cooking background), kneading dough, flouring, separating eggs, flipping a pan’s contents, blanching, and the myriad other tricks that any accomplished cook knows, but that you won’t find in a typical cookbook. Many of these tricks and techniques are used in preparing meals, but it’s hard to pigeonhole them as relevant for a given type of dish. And if you’re a novice cook looking up a fancy recipe, you’re likely to get frustrated quickly because serious cookbook authors assume you know these techniques, and they explain them (with illustrations!) only in books with titles such as Cooking for Divorced Middle-Aged Men. We didn’t want to exclude this precious category of tricks from this book, so a new category was born (sorry, no illustrations).
In the introduction to this chapter in the first edition, I presciently said:
I believe that the recipes in this chapter are among the most time-sensitive of the recipes in this volume. That’s because the aspects of the language that people consider shortcuts or noteworthy techniques seem to be relatively straightforward, idiomatic applications of recent language features.
I can proudly say that I was right. This new edition, significantly focused on the present definition of the language, makes many of the original recipes irrelevant. In the two Python releases since the book’s first edition, Python 2.3 and 2.4, the language has evolved to incorporate the ideas of those recipes into new syntactic features or library functions, just as it had done with every previous major release, making a cleaner, more compact, and yet more powerful language that’s as much fun to use today as it was over ten years ago.
All in all, about half the recipes in this chapter (roughly the same proportion as in the rest of the book) are entirely new ones, while the other half are vastly revised (mostly simplified) versions of recipes that were in the first edition. Thanks to the simplifications, and to the focus on just two language versions (2.3 and 2.4) rather than the whole panoply of older versions that was covered by the first edition, this chapter, as well as the book as a whole, has over one-third more recipes than the first edition did.
It’s worth noting in closing that many of the recipes that are in this newly revised chapter touch on some of the most fundamental, unchanging aspects of the language: the semantics of assignment, binding, copy, and references; sequences; dictionaries. These ideas are all keys to the Pythonic approach to programming, and seeing these recipes live for several years makes me wonder whether Python will evolve in the next few years in related directions.
4.1. Copying an Object
Credit: Anna Martelli Ravenscroft, Peter Cogolo
Problem
You want to copy an object. However, when you assign an object, pass it as an argument, or return it as a result, Python uses a reference to the original object, without making a copy.
Solution
Module copy
in the
standard Python library offers two functions to create copies. The one
you should generally use is the function named copy
, which returns a new object containing
exactly the same items and attributes as the object you’re
copying:
import copy new_list = copy.copy(existing_list)
On the rare occasions when you also want every item and
attribute in the object to be separately copied, recursively, use
deepcopy
:
import copy new_list_of_dicts = copy.deepcopy(existing_list_of_dicts)
Discussion
When you assign an object (or pass it as an argument, or return it as a result), Python (like Java) uses a reference to the original object, not a copy. Some other programming languages make copies every time you assign something. Python never makes copies “implicitly” just because you’re assigning: to get a copy, you must specifically request a copy.
Python’s behavior is simple, fast, and uniform. However, if you do need a copy and do not ask for one, you may have problems. For example:
>>> a = [1, 2, 3]
>>> b = a
>>> b.append(5)
>>> print a, b[1, 2, 3, 5] [1, 2, 3, 5]
Here, the names a
and
b
both refer to the same object (a list),
so once we alter the object through one of these names, we later see
the altered object no matter which name we use for it. No original,
unaltered copy is left lying about anywhere.
Warning
To become an effective Python programmer, it is crucial that
you learn to draw the distinction between altering an
object and assigning to a name,
which previously happened to refer to the object. These two kinds of
operations have nothing to do with each other. A statement such as
a=[ ]
rebinds name
a
but performs no alteration at all on
the object that was previously bound to name
a
. Therefore, the issue of references
versus copies just doesn’t arise in this case: the issue is
meaningful only when you alter some
object.
If you are about to alter an object, but you want to keep the
original object unaltered, you must make a copy. As this recipe’s
solution explains, the module copy
from the Python Standard Library offers two functions to make copies.
Normally, you use copy.copy
, which
makes a shallow copy—it copies an object, but
for each attribute or item of the object, it continues to share
references, which is faster and saves memory.
Shallow copying, alas, isn’t sufficient to entirely “decouple” a copied object from the original one, if you propose to alter the items or attributes of either object, not just the object itself:
>>> list_of_lists = [ ['a'], [1, 2], ['z', 23] ]
>>> copy_lol = copy.copy(lists_of_lists)
>>> copy_lol[1].append('boo')
>>> print list_of_lists, copy_lol[['a'], [1, 2, 'boo'], ['z', 23]] [['a'], [1, 2, 'boo'], ['z', 23]]
Here, the names list_of_lists
and copy_lol
refer to distinct
objects (two lists), so we could alter either of them without
affecting the other. However, each item of
list_of_lists
is the same object as
the corresponding item of copy_lol
,
so once we alter an item reached by indexing either of these names, we
later see the altered item no matter which object we’re indexing to
reach it.
If you do need to copy some container object
and also recursively copy all objects it refers
to (meaning all items, all attributes, and also items of items, items
of attributes, etc.), use copy.deepcopy
—such deep copying may cost you
substantial amounts of time and memory, but if you gotta, you gotta.
For deep copies, copy.deepcopy
is
the only way to go.
For normal shallow copies, you may have good alternatives to
copy.copy
, if you know the type of
the object you want to copy. To copy a list
L
, call list(
L
)
; to copy a dict d
, call dict(d)
; to copy a set
s
(in Python 2.4, which introduces the
built-in type set
), call set(s)
. (Since list
, dict
, and, in 2.4, set
, are built-in names, you do not need to
perform any “preparation” before you use any of them.) You get the
general pattern: to copy a copyable object
o,
which belongs to some built-in Python
type t
, you may generally just call
t(o)
. dict
s also offer a dedicated method to
perform a shallow copy: d.copy( )
and dict(d)
do the same thing. Of
the two, I suggest you use dict(d)
:
it’s more uniform with respect to other types, and it’s even shorter
by one character!
To copy instances of arbitrary types or classes, whether you
coded them or got them from a library, just use copy.copy
. If you code your own classes,
it’s generally not worth the bother to define your own
copy
or clone
method. If you want to
customize the way instances of your class get (shallowly) copied, your
class can supply a special method _ _copy_
_
(see Recipe
6.9 for a special technique relating to the implementation of
such a method), or special methods _
_getstate_ _
and _ _setstate_
_
. (See Recipe
7.4 for notes on these special methods, which also help with
deep copying and serialization—i.e.,
pickling—of instances of your class.) If you
want to customize the way instances of your class get
deeply copied, your class can supply a special
method _ _deepcopy_ _
(see Recipe 6.9.)
Note that you do not need to copy immutable objects (strings, numbers, tuples, etc.) because you don’t have to worry about altering them. If you do try to perform such a copy, you’ll just get the original right back; no harm done, but it’s a waste of time and code. For example:
>>> s = 'cat'
>>> t = copy.copy(s)
>>> s is tTrue
The is
operator checks
whether two objects are not merely equal, but in fact the
same object (is
checks for identity;
for checking mere equality, you use the ==
operator). Checking object identity is
not particularly useful for immutable objects (we’re using it here
just to show that the call to copy.copy
was useless, although innocuous).
However, checking object identity can sometimes be quite important for
mutable objects. For example, if you’re not sure whether two names
a
and b
refer to
separate objects, or whether both refer to the same object, a simple
and very fast check a is b
lets you
know how things stand. That way you know whether you need to copy the
object before altering it, in case you want to keep the original
object unaltered.
Warning
You can use other, inferior ways exist to create copies,
namely building your own. Given a list L
,
both a “whole-object slice” L[:]
and a list comprehension [x for x in
L]
do happen to make a (shallow) copy of
L
, as do adding an empty list, L+[ ]
, and multiplying the list by 1,
L*1
. . . but each of these
constructs is just wasted effort and obfuscation—calling list(L)
is clearer and faster. You should,
however, be familiar with the L[:]
construct because for historical
reasons it’s widely used. So, even though you’re best advised not to
use it yourself, you’ll see it in Python code written by
others.
>>> d1 = { } >>> for somekey in d: ... d1[somekey] = d[somekey]
See Also
Module copy
in the
Library Reference and Python in a
Nutshell.
4.2. Constructing Lists with List Comprehensions
Credit: Luther Blissett
Problem
You want to construct a new list by operating on elements of an existing sequence (or other kind of iterable).
Solution
Say you want to create a new list by adding 23 to each item of some other list. A list comprehension expresses this idea directly:
thenewlist = [x + 23 for x in theoldlist]
Similarly, say you want the new list to comprise all items in the other list that are larger than 5. A list comprehension says exactly that:
thenewlist = [x for x in theoldlist if x > 5]
When you want to combine both ideas, you can perform selection
with an if
clause, and also use
some expression, such as adding 23, on the selected items, in a single
pass:
thenewlist = [x + 23 for x in theoldlist if x > 5]
Discussion
Elegance, clarity, and pragmatism, are Python’s core values.
List comprehensions show how pragmatism can enhance both clarity and
elegance. Indeed, list comprehensions are often the best approach even
when, instinctively, you’re thinking not of constructing a new list
but rather of “altering an existing list”. For example, if your task
is to set all items greater than 100 to 100, in an existing list
object L
, the best solution
is:
L[:] = [min(x,100) for x in L]
Assigning to the “whole-list slice” L[:]
alters the existing list object in
place, rather than just rebinding the name
L
, as would be the case if you coded
L = . .
. instead.
You should not use a list comprehension when you simply want to perform a loop. When you want a loop, code a loop. For an example of looping over a list, see Recipe 4.4. See Chapter 19 for more information about iteration in Python.
It’s also best not to use a list comprehension when another
built-in does what you want even more directly and immediately. For
example, to copy a list, use L1 =
list(L)
,
not:
L1 = [x for x in L]
Similarly, when the operation you want to perform on each item
is to call a function on the item and use the function’s result, use
L1 = map(f, L)
rather than L1 = [f(x)
for x in
L]
. But in most cases, a list comprehension is just
right.
In Python 2.4, you should consider using a generator expression,
rather than a list comprehension, when the sequence may be long and
you only need one item at a time. The syntax of generator expressions
is just the same as for list comprehensions, except that generator
expressions are surrounded by parentheses, (
and )
,
not brackets, [
and ]
. For example, say that we only need the
summation of the list computed in this recipe’s Solution, not each
item of the list. In Python 2.3, we would code:
total = sum([x + 23 for x in theoldlist if x > 5])
In Python 2.4, we can code more naturally, omitting the brackets
(no need to add additional parentheses—the parentheses already needed
to call the built-in sum
suffice):
total = sum(x + 23 for x in theoldlist if x > 5)
Besides being a little bit cleaner, this method avoids materializing the list as a whole in memory and thus may be slightly faster when the list is extremely long.
See Also
The Reference Manual section on list
displays (another name for list comprehensions) and Python 2.4
generator expressions; Chapter
19; the Library Reference and
Python in a Nutshell docs on the itertools
module and on the built-in
functions map
, filter
, and sum
; Haskell is at http://www.haskell.org.
Warning
Python borrowed list comprehensions from the functional language Haskell (http://www.haskell.org), changing the syntax to use keywords rather than punctuation. If you do know Haskell, though, take care! Haskell’s list comprehensions, like the rest of Haskell, use lazy evaluation (also known as normal order or call by need). Each item is computed only when it’s needed. Python, like most other languages, uses (for list comprehensions as well as elsewhere) eager evaluation (also known as applicative order, call by value, or strict evaluation). That is, the entire list is computed when the list comprehension executes, and kept in memory afterwards as long as necessary. If you are translating into Python a Haskell program that uses list comprehensions to represent infinite sequences, or even just long sequences of which only one item at a time must be kept around, Python list comprehensions may not be suitable. Rather, look into Python 2.4’s new generator expressions, whose semantics are closer to the spirit of Haskell’s lazy evaluation—each item gets computed only when needed.
4.3. Returning an Element of a List If It Exists
Credit: Nestor Nissen, A. Bass
Problem
You have a list L
and an
index i
, and you want to get L[i]
when i
is a
valid index into L
; otherwise, you want to
get a default value v
. If
L
were a dictionary, you’d use L.get(i, v)
, but lists don’t have a get
method.
Solution
Clearly, we need to code a function, and, in this case, the simplest and most direct approach is the best one:
def list_get(L, i, v=None): if -len(L) <= i < len(L): return L[i] else: return v
Discussion
The function in this recipe just checks whether
i
is a valid index by applying Python’s
indexing rule: valid indices are negative ones down to -len(L)
inclusive, and non-negative ones up
to len(L)
exclusive. If almost all
calls to list_get
pass a valid index value for
i
, you might prefer an alternative
approach:
def list_get_egfp(L, i, v=None): try: return L[i] except IndexError: return v
However, unless a vast majority of the calls pass a valid index,
this alternative (as some time-measurements show) can be up to four
times slower than the list_get
function shown in the
solution. Therefore, this “easier to get forgiveness than permission”
(EGFP) approach, although it is often preferable in Python, cannot be
recommended for this specific case.
I’ve also tried quite a few fancy, intricate and obscure
approaches, but, besides being hard to explain and to understand, they
all end up slower than the plain, simple function
list_get
. General principle: when you write Python
code, prefer clarity and readability to compactness and
terseness—choose simplicity over subtlety. You often will be rewarded
with code that runs faster, and invariably, you will end up with code
that is less prone to bugs and is easier to maintain, which is far
more important than minor speed differences in 99.9% of the cases you
encounter in the real world.
See Also
Language Reference and Python in a Nutshell documentation on list indexing.
4.4. Looping over Items and Their Indices in a Sequence
Credit: Alex Martelli, Sami Hangaslammi
Problem
You need to loop on a sequence, but at each step you also need to know which index into the sequence you have reached (e.g., because you need to rebind some entries in the sequence), and Python’s preferred approach to looping doesn’t use the indices.
Solution
That’s what built-in function enumerate
is for. For example:
for index, item in enumerate(sequence): if item > 23: sequence[index] = transform(item)
This is cleaner, more readable, and faster than the alternative of looping over indices and accessing items by indexing:
for index in range(len(sequence)): if sequence[index] > 23: sequence[index] = transform(sequence[index])
Discussion
Looping on a sequence is a very frequent need, and Python strongly encourages you to do just that, looping on the sequence directly. In other words, the Pythonic way to get each item in a sequence is to use:
for item in sequence: process(item)
rather than the indirect approach, typical of lower-level languages, of looping over the sequence’s indices and using each index to fetch the corresponding item:
for index in range(len(sequence)): process(sequence[index])
Looping directly is cleaner, more readable, faster, and more general (since you can loop on any iterable, by definition, while indexing works only on sequences, such as lists).
However, sometimes you do need to know the index, as well as the
corresponding item, within the loop. The most frequent reason for this
need is that, in order to rebind an entry in a list, you must assign
the new item to thelist[index]
. To
support this need, Python offers the built-in function enumerate
, which takes any iterable argument
and returns an iterator yielding all the pairs (two-item tuples) of
the form (index
,
item
), one pair at a time. By writing your
for
loop’s header clause in the
form:
for index, item in enumerate(sequence):
both the index and the item are available within the loop’s body.
For help remembering the order of the items in each pair
enumerate
yields, think of the
idiom
d
=dict(enumerate(L
)).
This gives a dictionary d
that’s equivalent
to list L
, in the sense that d[i] is L[i]
for any valid non-negative
index i
.
See Also
Library Reference and Python
in a Nutshell section about enumerate
; Chapter 19.
4.5. Creating Lists of Lists Without Sharing References
Credit: David Ascher
Solution
To build a list and avoid implicit reference sharing, use a list comprehension. For example, to build a 5 x 10 array of zeros:
multilist = [[0 for col in range(5)] for row in range(10)]
Discussion
When a newcomer to Python is first shown that multiplying a list by an integer repeats that list that many times, the newcomer often gets quite excited about it, since it is such an elegant notation. For example:
>>> alist = [0] * 5
is clearly an excellent way to get an array of 5 zeros.
The problem is that one-dimensional tasks often grow a second dimension, so there is a natural progression to:
>>> multi = [[0] * 5] * 3
>>> print multi[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
This appears to work, but the same newcomer is then often puzzled by bugs, which typically can be boiled down to a snippet such as:
>>> multi[0][0] = 'oops!'
>>> print multi[['oops!', 0, 0, 0, 0], ['oops!', 0, 0, 0, 0], ['oops!', 0, 0, 0, 0]]
This issue confuses most programmers at least once, if not a few times (see the FAQ entry at http://www.python.org/doc/FAQ.html#4.50). To understand the issue, it helps to decompose the creation of the multidimensional list into two steps:
>>> row = [0] * 5 # a list with five references to 0 >>> multi = [row] * 3 # a list with three references to the row object
This decomposed snippet produces a multi
that’s identical to that given by the
more concise snippet [[0]*5]*3
shown earlier, and it has exactly the same problem: if you now assign
a value to multi[0][0]
, you have
also changed the value of multi[1][0]
and that of multi[2][0]
. . . , and, indeed, you have
changed the value of row[0]
,
too!
The comments are key to understanding the source of the
confusion. Multiplying a sequence by a number creates a new sequence
with the specified number of new references to the original contents.
In the case of the creation of row
,
it doesn’t matter whether or not references are being duplicated,
since the referent (the object being referred to) is a number, and
therefore immutable. In other words, there is no practical difference
between an object and a reference to an object if that object is
immutable. In the second line, however, we create a new list
containing three references to the contents of the
[row]
list, which holds a single reference
to a list. Thus, multi
contains three
references to a single list object. So, when the first element of the
first element of multi
is changed, you are
actually modifying the first element of the shared list. Hence the
surprise.
List comprehensions, as shown in the “Solution”, avoid the problem. With list comprehensions, no sharing of references occurs—you have a truly nested computation. If you have followed the discussion thoroughly, it may have occurred to you that we don’t really need the inner list comprehension, only the outer one. In other words, couldn’t we get just the same effect with:
multilist = [[0]*5 for row in range(10)]
The answer is that, yes, we could, and in fact using list multiplication for the innermost axis and list comprehension for all outer ones is faster—over twice as fast in this example. So why don’t I recommend this latest solution? Answer: the speed improvement for this example is from 57 down to 24 microseconds in Python 2.3, from 49 to 21 in Python 2.4, on a typical PC of several years ago (AMD Athlon 1.2 GHz CPU, running Linux). Shaving a few tens of microseconds from a list-creation operation makes no real difference to your application’s performance: and you should optimize your code, if at all, only where it matters, where it makes a substantial and important difference to the performance of your application as a whole. Therefore, I prefer the code shown in the recipe’s Solution, simply because using the same construct for both the inner and the outer list creations makes it more conceptually symmetrical and easier to read!
See Also
Documentation for the range
built-in function in the Library Reference and
Python in a Nutshell.
4.6. Flattening a Nested Sequence
Credit: Luther Blissett, Holger Krekel, Hemanth Sethuram, ParzAspen Aspen
Problem
Some of the items in a sequence may in turn be sub-sequences, and so on, to arbitrary depth of “nesting”. You need to loop over a “flattened” sequence, “expanding” each sub-sequence into a single, flat sequence of scalar items. (A scalar, or atom, is anything that is not a sequence—i.e., a leaf, if you think of the nested sequence as a tree.)
Solution
We need to be able to tell which of the elements we’re
handling are “subsequences” to be “expanded” and which are “scalars”
to be yielded as is. For generality, we can take an argument that’s a
predicate to tell us what
items we are to expand. (A predicate is a
function that we can call on any element and that returns a truth
value: in this case, True
if the
element is a subsequence we are to expand, False
otherwise.) By default, we can
arbitrarily say that every list
or
tuple
is to be “expanded”, and
nothing else. Then, a recursive generator offers the simplest
solution:
def list_or_tuple(x): return isinstance(x, (list, tuple)) def flatten(sequence, to_expand=list_or_tuple): for item in sequence: if to_expand(item): for subitem in flatten(item, to_expand): yield subitem else: yield item
Discussion
Flattening a nested sequence, or, equivalently, “walking” sequentially over all the leaves of a “tree”, is a common task in many kinds of applications. You start with a nested structure, with items grouped into sequences and subsequences, and, for some purposes, you don’t care about the structure at all. You just want to deal with the items, one after the other. For example,
for x in flatten([1, 2, [3, [ ], 4, [5, 6], 7, [8,], ], 9]): print x,
emits 1 2 3 4 5 6 7 8
9
.
The only problem with this common task is that, in the general
case, determining what is to be “expanded”, and what is to be yielded
as a scalar, is not as obvious as it might seem. So, I ducked that
decision, delegating it to a callable predicate argument that the
caller can pass to flatten
, unless the caller accepts
flatten
’s somewhat simplistic default behavior of
expanding just tuples and lists.
In the same module as flatten
, we should also
supply another predicate that a caller might well want to use—a
predicate that will expand just about any iterable
except strings (plain and Unicode). Strings are
iterable, but almost invariably applications want to treat them as
scalars, not as subsequences.
To identify whether an object is iterable, we just need
to try calling the built-in iter
on
that object: the call raises TypeError
if the object is not iterable. To
identify whether an object is string-like, we simply check whether the
object is an instance of basestring
, since isinstance(obj, basestring)
is True
when obj
is
an instance of any subclass of basestring
—that is, any string-like type.
So, the alternative predicate is not hard to code:
def nonstring_iterable(obj): try: iter(obj) except TypeError: return False else: return not isinstance(obj, basestring)
Now the caller may choose to call flatten(seq, nonstring_iterable)
when the
need is to expand any iterable that is not a string. It is surely
better not to make the
nonstring_iterable
predicate the default for
flatten
, though: in a simple case, such as the
example snippet we showed previously, flatten
can be
up to three times slower when the predicate is
nonstring_iterable
rather than
list_or_tuple
.
We can also write a nonrecursive version of generator
flatten
. Such a version lets you flatten nested
sequences with nesting levels higher than Python’s recursion limit,
which normally allows no more than a few thousand levels of recursion
depth. The main technique for recursion removal is to keep an explicit
last-in, first-out (LIFO) stack, which, in this case, we can implement
with a list of iterators:
def flatten(sequence, to_expand=list_or_tuple): iterators = [ iter(sequence) ] while iterators: # loop on the currently most-nested (last) iterator for item in iterators[-1]: if to_expand(item): # subsequence found, go loop on iterator on subsequence iterators.append(iter(item)) break else: yield item else: # most-nested iterator exhausted, go back, loop on its parent iterators.pop( )
The if
clause of the if
statement executes for any item we are to
expand—that is, any subsequence on which we must loop; so in that
clause, we push an iterator for the subsequence to the end of the
stack, then execute a break
to
terminate the for
, and go back to
the outer while
, which will in turn
execute a new for
statement on the
iterator we just appended to the stack. The else
clause of the if
statement executes for any item we don’t
expand, and it just yield
s the
item.
The else
clause of the
for
statement executes if no
break
statement interrupts the
for
loop—in other words, when the
for
loop runs to completion,
exhausting the currently most-nested iterator. So, in that else
clause, we remove the now-exhausted
most-nested (last) iterator, and the outer while
loop proceeds, either terminating if
no iterators are left on the stack, or executing a new for
statement that continues the loop on the
iterator that’s back at the top of the stack—from wherever that
iterator had last left off, intrinsically, because an iterator’s job
is exactly to remember iteration state.
The results of this nonrecursive implementation of
flatten
are identical to those of the simpler
recursive version given in this recipe’s Solution. If you think
non-recursive implementations are faster than recursive ones, though,
you may be disappointed: according to my measurements, the
nonrecursive version is about 10% slower than the
recursive one, across a range of cases.
See Also
Library Reference and Python
in a Nutshell sections on sequence types and built-ins
iter
, isinstance
, and basestring
.
4.7. Removing or Reordering Columnsin a List of Rows
Credit: Jason Whitlark
Problem
You have a list of lists (rows) and need to get another list of the same rows but with some columns removed and/or reordered.
Solution
A list comprehension works well for this task. Say you have:
listOfRows = [ [1,2,3,4], [5,6,7,8], [9,10,11,12] ]
You want a list with the same rows but with the second of the four columns removed and the third and fourth ones interchanged. A simple list comprehension that performs this job is:
newList = [ [row[0], row[3], row[2]] for row in listOfRows ]
An alternative way of coding, that is at least as practical and
arguably a bit more elegant, is to use an auxiliary sequence (meaning
a list or tuple) that has the column indices you desire in their
proper order. Then, you can nest an inner list comprehension that
loops on the auxiliary sequence inside the outer list comprehension
that loops on listOfRows
:
newList = [ [row[ci] for ci in (0, 3, 2)] for row in listofRows ]
Discussion
I often use lists of lists to represent two-dimensional arrays. I think of such lists as having the “rows” of a “two-dimensional array” as their items. I often perform manipulation on the “columns” of such a “two-dimensional array”, typically reordering some columns, sometimes omitting some of the original columns. It is not obvious (at least, it was not immediately obvious to me) that list comprehensions are just as useful for this purpose as they are for other kinds of sequence-manipulation tasks.
A list comprehension builds a new list, rather than altering an
existing one. But even when you do need to alter the existing list in
place, the best approach is to write a list comprehension and assign
it to the existing list’s contents. For example, if you needed to
alter listOfRows
in place, for the example
given in this recipe’s Solution, you would code:
listOfRows[:] = [ [row[0], row[3], row[2]] for row in listOfRows ]
Do consider, as suggested in the second example in this recipe’s Solution, the possibility of using an auxiliary sequence to hold the column indices you desire, in the order in which you desire them, rather than explicitly hard-coding the list display as we did in the first example. You might feel a little queasy about nesting two list comprehensions into each other in this fashion, but it’s simpler and safer than you might fear. If you adopt this approach, you gain some potential generality, because you can choose to give a name to the auxiliary sequence of indices, use it to reorder several lists of rows in the same fashion, pass it as an argument to a function, whatever:
def pick_and_reorder_columns(listofRows, column_indexes): return [ [row[ci] for ci in column_indexes] for row in listofRows ] columns = 0, 3, 2 newListOfPandas = pick_and_reorder_columns(oldListOfPandas, columns) newListOfCats = pick_and_reorder_columns(oldListOfCats, columns)
This example performs just the same column reordering and
selection as all the other snippets in this recipe, but it performs
the operation on two separate “old” lists, obtaining from each the
corresponding “new” list. Reaching for excessive generalization is a
pernicious temptation, but here, with this
pick_and_reorder_columns
function, it seems that we
are probably getting just the right amount of generality.
One last note: some people prefer a fancier way to express the kinds of list comprehensions that are used as “inner” ones in some of the functions used previously. Instead of coding them straightforwardly, as in:
[row[ci] for ci in column_indexes]
they prefer to use the built-in function map
, and the special method _ _getitem_ _
of
row
used as a bound-method, to perform the
indexing subtask, so they code instead:
map(row._ _getitem_ _, column_indexes)
Depending on the exact version of Python, perhaps this fancy and somewhat obscure way may be slightly faster. Nevertheless, I think the greater simplicity of the list comprehension form means the list comprehension is still the best way.
See Also
List comprehension docs in Language Reference and Python in a Nutshell.
4.8. Transposing Two-Dimensional Arrays
Credit: Steve Holden, Raymond Hettinger, Attila Vàsàrhelyi, Chris Perkins
Solution
You must start with a list whose items are lists all of the same length, such as:
arr = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
A list comprehension offers a simple, handy way to transpose such a two-dimensional array:
print [[r[col] for r in arr] for col in range(len(arr[0]))][[1, 4, 7, 10], [2, 5, 8, 11], [3, 6, 9, 12]]
A faster though more obscure alternative (with exactly the same
output) can be obtained by exploiting built-in function zip
in a slightly strange way:
print map(list, zip(*arr))
Discussion
This recipe shows a concise yet clear way to turn rows into
columns, and also a faster though more obscure way. List
comprehensions work well when you want to be clear yet concise, while
the alternative solution exploits the built-in function zip
in a way that is definitely not
obvious.
Sometimes data just comes at you the wrong way. For
instance, if you use Microsoft’s ActiveX Data Ojbects (ADO) database
interface, due to array element-ordering differences between Python
and Microsoft’s preferred implementation language (Visual Basic), the
GetRows
method actually appears to
return database columns in Python, despite the
method’s name. This recipe’s two solutions to this common kind of
problem let you choose between clarity and speed.
In the list comprehension solution, the inner comprehension varies what is selected from (the row), while the outer comprehension varies the selector (the column). This process achieves the required transposition.
In the zip
-based solution, we
use the *a
syntax to pass each item
(row) of arr
to zip
, in order, as a separate positional
argument. zip
returns a list of
tuples, which directly achieves the required transposition; we then
apply list
to each tuple, via the
single call to map
, to obtain a
list of lists, as required. Since we don’t use zip
’s result as a list directly, we could
get a further slight improvement in performance by using itertools.izip
instead (because izip
does not materialize its result as a
list in memory, but rather yields it one item at a time):
import itertools print map(list, itertools.izip(*arr))
but, in this specific case, the slight speed increase is probably not worth the added complexity.
If you’re transposing large arrays of numbers, consider Numeric Python and other third-party packages. Numeric Python defines transposition and other axis-swinging routines that will make your head spin.
See Also
The Reference Manual and
Python in a Nutshell sections on list displays
(the other name for list comprehensions) and on the *a
and *k
notation for positional and named argument passing; built-in functions
zip
and map
; Numeric Python (http://www.pfdubois.com/numpy/).
4.9. Getting a Value from a Dictionary
Credit: Andy McKay
Problem
You need to obtain a value from a dictionary, without having to handle an exception if the key you seek is not in the dictionary.
Solution
That’s what the get
method of dictionaries is for. Say you have a dictionary such as
d
=
{'key':'value',}
. To get the value corresponding to
key
in d
in an
exception-safe way, code:
print d.get('key', 'not found')
If you need to remove the entry after you have obtained the
value, call d
.pop
(which does a get-and-remove) instead of
d
.get
(which just reads d
and never changes
it).
Discussion
Want to get a value for a key from a dictionary, without getting
an exception if the key does not exist in the dictionary? Use the
simple and useful get
method of the
dictionary.
If you try to get a value with the indexing syntax d[x]
, and the value of
x
is not a key in dictionary
d
, your attempt raises a KeyError
exception. This is often okay. If
you expected the value of x
to be a key in
d
, an exception is just the right way to
inform you that you’re mistaken (i.e., that you need to debug your
program).
However, you often need to be more tentative about it: as far as
you know, the value of x
may or may not be
a key in d
. In this case, don’t start
messing with in
tests, such
as:
if 'key' in d: print d['key'] else: print 'not found'
or try
/except
statements, such as:
try: print d['key'] except KeyError: print 'not found'
Instead, use the get
method,
as shown in the “Solution”. If you call d.get(x)
, no exception is thrown: you get
d[x]
if
x
is a key in d
,
and if it’s not, you get None
(which you can check for or propagate). If None
is not what you want to get when
x
is not a key of
d
, call d.get(x
,
somethingelse
)
instead. In this case, if
x
is not a key, you will get the value of
somethingelse
.
get
is a simple, useful
mechanism that is well explained in the Python documentation, but a
surprising number of people don’t know about it. Another similar
method is pop
, which is mostly like
get
, except that, if the key was in
the dictionary, pop
also removes
it. Just one caveat: get
and
pop
are not
exactly parallel. d.pop(x)
does raise
KeyError
if
x
is not a key in
d
; to get exactly the same effect as
d.get(x)
, plus the entry removal,
call d.pop(x,None)
instead.
See Also
Recipe 4.10; the Library Reference and Python in a Nutshell sections on mapping types.
4.10. Adding an Entry to a Dictionary
Credit: Alex Martelli, Martin Miller, Matthew Shomphe
Problem
Working with a dictionary d
, you need to use the entry d[k]
when it’s already present, or add a new
value as d[k]
when
k
isn’t yet a key in
d
.
Solution
This is what the setdefault
method of dictionaries is for.
Say we’re building a word- to-page-numbers index, a dictionary that
maps each word to the list of page numbers where it appears. A key
piece of code in that application might be:
def addword(theIndex, word, pagenumber): theIndex.setdefault(word, [ ]).append(pagenumber)
This code is equivalent to more verbose approaches such as:
def addword(theIndex, word, pagenumber): if word in theIndex: theIndex[word].append(pagenumber) else: theIndex[word] = [pagenumber]
and:
def addword(theIndex, word, pagenumber): try: theIndex[word].append(pagenumber) except KeyError: theIndex[word] = [pagenumber]
Using method setdefault
simplifies this task considerably.
Discussion
For any dictionary d
, d.setdefault(k, v)
is very similar to
d.get(k, v)
, which was covered
previously in Recipe
4.9. The essential difference is that, if
k
is not a key in the dictionary, the
setdefault
method assigns d[k]=v
as a side effect, in addition to
returning v
. (get
would just return
v
, without affecting
d
in any way.) Therefore, consider using
setdefault
any time you have
get
-like needs, but also want to
produce this side effect on the dictionary.
setdefault
is particularly
useful in a dictionary with values that are lists, as detailed in
Recipe 4.15. The most
typical usage for setdefault
is
something like:
somedict.setdefault(somekey, [ ]).append(somevalue)
setdefault
is not all that
useful for immutable values, such as numbers. If you just want to
count words, for example, the right way to code is to use,
not setdefault
, but rather get
:
theIndex[word] = theIndex.get(word, 0) + 1
since you must rebind the dictionary entry at theIndex[word]
anyway (because numbers are
immutable). But for our word-to page-numbers example, you
definitely do not want to fall into the
performance trap that’s hidden in the following approach:
def addword(theIndex, word, pagenumber): theIndex[word] = theIndex.get(word, [ ]) + [pagenumber]
This latest version of addword
builds three new
lists each time you call it: an empty list that’s passed as the second
argument to theIndex.get
, a
one-item list containing just pagenumber
,
and a list with N+1
items obtained
by concatenating these two (where N
is the
number of times that word
was previously
found). Building such a huge number of lists is sure to take its toll,
in performance terms. For example, on my machine, I timed the task of
indexing the same four words occurring once each on each of 1,000
pages. Taking the first version of addword
in the
recipe as a reference point, the second one (using try
/except
) is about 10% faster, the third one
(using setdefault
) is about 20%
slower—the kind of performance differences that you should blissfully
ignore in just about all cases. This fourth version (using get
) is four times
slower—the kind of performance difference you just can’t afford to
ignore.
See Also
Recipe 4.9;
Recipe 4.15;
Library Reference and Python in a
Nutshell documentation about dict
.
4.11. Building a Dictionary Without Excessive Quoting
Credit: Brent Burley, Peter Cogolo
Problem
You want to construct a dictionary whose keys are literal strings, without having to quote each key.
Solution
Once you get into the swing of Python, you’ll find yourself
constructing a lot of dictionaries. When the keys are identifiers, you
can avoid quoting them by calling dict
with named-argument syntax:
data = dict(red=1, green=2, blue=3)
This is neater than the equivalent use of dictionary-display syntax:
data = {'red': 1, 'green': 2, 'blue': 3}
Discussion
One powerful way to build a dictionary is to call the
built-in type dict
. It’s often a
good alternative to the dictionary-display syntax with braces and
colons. This recipe shows that, by calling dict
, you can avoid having to quote keys,
when the keys are literal strings that happen to be syntactically
valid for use as Python identifiers. You cannot use this approach for
keys such as the literal strings '12ba
' or 'for
', because '12ba
' starts with a digit, and for
happens to be a Python keyword, not an
identifier.
Also, dictionary-display syntax is the only case in Python where
you need to use braces: if you dislike braces, or happen to work on a
keyboard that makes braces hard to reach (as all Italian layout
keyboards do!), you may be happier, for example, using dict()
rather than { }
to build an empty dictionary.
Calling dict
also gives you
other possibilities. dict(d)
returns a new dictionary that is an independent copy of existing
dictionary d
, just like d.copy( )
—but dict(d)
works even when
d
is a sequence of pairs (key, value)
instead of being a dictionary
(when a key
occurs more than once in the
sequence, the last appearance of the key
applies). A common dictionary-building idiom is:
d = dict(zip(the_keys, the_values))
where the_keys
is a sequence of keys
and the_values
a “parallel” sequence of
corresponding values. Built-in function zip
builds and returns a list of (key, value)
pairs, and built-in type
dict
accepts that list as its
argument and constructs a dictionary accordingly. If the sequences are
long, it’s faster to use module itertools
from the standard Python
library:
import itertools d = dict(itertools.izip(the_keys, the_values))
Built-in function zip
constructs the whole list of pairs in memory, while itertools.izip
yields only one pair at a
time. On my machine, with sequences of 10,000 numbers, the latter
idiom is about twice as fast as the one using zip
—18 versus 45 milliseconds with Python
2.3, 17 versus 32 with Python 2.4.
You can use both a positional argument and named arguments in
the same call to dict
(if the named
argument clashes with a key specified in the positional argument, the
named argument applies). For example, here is a workaround for the
previously mentioned issue that Python keywords, and other
nonidentifiers, cannot be used as argument names:
d = dict({'12ba':49, 'for': 23}, rof=41, fro=97, orf=42)
If you need to build a dictionary where the same value
corresponds to each key, call dict.fromkeys(keys_sequence, value)
(if you
omit the value
, it defaults to None
). For example, here is a neat way to
initialize a dictionary to be used for counting occurrences of various
lowercase ASCII letters:
import string count_by_letter = dict.fromkeys(string.ascii_lowercase, 0)
See Also
Library Reference and Python
in a Nutshell sections on built-ins dict
and zip
, and on modules itertools
and string
.
4.12. Building a Dict from a List of Alternating Keys and Values
Credit: Richard Philips, Raymond Hettinger
Problem
You want to build a dict from a list of alternating keys and values.
Solution
The built-in type dict
offers
many ways to build dictionaries, but not this one, so we need to code
a function for the purpose. One way is to use the built-in function
zip
on extended slices:
def dictFromList(keysAndValues): return dict(zip(keysAndValues[::2], keysAndValues[1::2]))
A more general approach, which works for any sequence or other
iterable argument and not just for lists, is to “factor out” the task
of getting a sequence of pairs from a flat sequence into a separate
generator. This approach is not quite as concise as
dictFromList
, but it’s faster as well as more
general:
def pairwise(iterable): itnext = iter(iterable).next while True: yield itnext( ), itnext( ) def dictFromSequence(seq): return dict(pairwise(seq))
Defining pairwise
also allows
updating an existing dictionary with any sequence
of alternating keys and values—just code, for example, mydict.update(pairwise(seq))
.
Discussion
Both of the “factory functions” in this recipe use the same
underlying way to construct a dictionary: each calls dict
with an argument that is a sequence of
(key, value)
pairs. All the
difference is in how the functions build the sequence of pairs to pass
to dict
.
dictFromList
builds a list of such pairs by
calling built-in function zip
with
two extended-form slices of the function’s
keysAndValues
argument—one that gathers all items
with even indices (meaning the items at index 0, 2, 4, . . .), the
other that gathers all items with odd indices (starting at 1 and
counting by 2 . . .). This approach is fine, but it works only when
the argument named keysAndValues
is an instance of a
type or class that supports extended slicing, such as list
, tuple
or str
. Also, this approach results in
constructing several temporary lists in memory: if
keysAndValues
is a long sequence, all of this list
construction activity can cost some performance.
dictFromSequence
, on the other hand, delegates
the task of building the sequence of pairs to the generator named
pairwise
. In turn, pairwise
is coded
to ensure that it can use any iterable at all—not just lists (or other
sequences, such as tuples or strings), but also, for example, results
of other generators, files, dictionaries, and so on. Moreover,
pairwise
yields pairs one at a time. It never
constructs any long list in memory, an aspect that may improve
performance if the input sequence is very long.
The implementation of pairwise
is interesting.
As its very first statement, pairwise
binds local
name itnext
to the bound-method
next
of the iterator that it
obtains by calling the built-in function iter
on the iterable
argument. This may seem a bit strange, but it’s a good general
technique in Python: if you start with an object, and all you need to
do with that object is call one of its methods in a loop, you can
extract the bound-method, assign it to a local name, and afterwards
just call the local name as if it were a function.
pairwise
would work just as well if the next
method was instead called in a way that
may look more normal to programmers who are used to other
languages:
def pairwise_slow(iterable): it = iter(iterable) while True: yield it.next( ), it.next( )
However, this pairwise_slow
variant isn’t
really any simpler than the pairwise
generator shown
in the Solution (“more familiar to people who don’t know Python” is
not a synonym of “simpler”!), and it
is about 60% slower. Focusing on simplicity and
clarity is one thing, and a very good one—indeed, a core principle of
Python. Throwing performance to the winds,
without getting any real advantage to compensate,
is a completely different proposition and definitely not a practice
that can be recommended in any language. So, while it is an excellent
idea to focus on writing correct, clear, and simple code, it’s also
very advisable to learn and use Python’s idioms that are most
appropriate to your needs.
See Also
Recipe 19.7 for more general approaches to looping by sliding windows over an iterable. See the Python Reference Manual for more on extended slicing.
4.13. Extracting a Subset of a Dictionary
Credit: David Benjamin
Problem
You want to extract from a larger dictionary only that subset of it that corresponds to a certain set of keys.
Solution
If you want to leave the original dictionary intact:
def sub_dict(somedict, somekeys, default=None): return dict([ (k, somedict.get(k, default)) for k in somekeys ])
If you want to remove from the original the items you’re extracting:
def sub_dict_remove(somedict, somekeys, default=None): return dict([ (k, somedict.pop(k, default)) for k in somekeys ])
Two examples of these functions’ use and effects:
>>> d = {'a': 5, 'b': 6, 'c': 7} >>> print sub_dict(d, 'ab'), d{'a': 5, 'b': 6} {'a': 5, 'b': 6, 'c': 7}
>>> print sub_dict_remove(d, 'ab'), d{'a': 5, 'b': 6} {'c': 7}
Discussion
In Python, I use dictionaries for many purposes—database rows,
primary and compound keys, variable namespaces for template parsing,
and so on. So, I often need to create a dictionary that is based on
another, larger dictionary, but only contains the subset of the larger
dictionary corresponding to some set of keys. In most use cases, the
larger dictionary must remain intact after the extraction; sometimes,
however, I need to remove from the larger dictionary the subset that
I’m extracting. This recipe’s solution shows both possibilities. The
only difference is that you use method get
when you want to avoid affecting the
dictionary that you are getting data from, method pop
when you want to remove the items you’re
getting.
If some item k
of
somekeys
is not in fact a key in
somedict
, this recipe’s functions put
k
as a key in the result anyway, with a
default value (which I pass as an optional argument to either
function, with a default value of None
). So, the result is not necessarily a
subset of somedict
. This behavior is the
one I’ve found most useful in my applications.
You might prefer to get an exception for “missing keys”—that
would help alert you to a bug in your program, in cases in which you
know all k
s in
somekeys
should definitely also be keys in
somedict
. Remember, “errors should never
pass silently. Unless explicitly silenced,” to quote The
Zen of Python, by Tim Peters (enter the statement import this
at an interactive Python prompt
to read or re-read this delightful summary of Python’s design
principles). So, if a missing key is an error, from the point of view
of your application, then you do want to get an
exception that alerts you to that error at once, if it ever occurs. If
this is what you want, you can get it with minor modifications to this
recipe’s functions:
def sub_dict_strict(somedict, somekeys): return dict([ (k, somedict[k]) for k in somekeys ]) def sub_dict_remove_strict(somedict, somekeys): return dict([ (k, somedict.pop(k)) for k in somekeys ])
As you can see, these strict variants are even simpler than the originals—a good indication that Python likes to raise exceptions when unexpected behavior occurs!
Alternatively, you might prefer missing keys to be simply omitted from the result. This, too, requires just minor modifications:
def sub_dict_select(somedict, somekeys): return dict([ (k, somedict[k]) for k in somekeys if k in somedict]) def sub_dict_remove_select(somedict, somekeys): return dict([ (k, somedict.pop(k)) for k in somekeys if k in somedict])
The if
clause in each list
comprehension does all we need to distinguish these
_select
variants from the _strict
ones.
In Python 2.4, you can use generator expressions,
instead of list comprehensions, as the arguments to dict
in each of the functions shown in this
recipe. Just change the syntax of the calls to dict
, from dict([
. . .])
to dict(
. . .)
(removing the brackets adjacent to the
parentheses) and enjoy the resulting slight simplification and
acceleration. However, these variants would not work in Python 2.3,
which has list comprehensions but not generator expressions.
See Also
Library Reference and Python
in a Nutshell documentation on dict
.
4.14. Inverting a Dictionary
Credit: Joel Lawhead, Ian Bollinger, Raymond Hettinger
Problem
An existing dict
maps
keys to unique values, and you want to build the inverse dict
, mapping each value to its key.
Solution
You can write a function that passes a list comprehension as
dict
’s argument to build the new
requested dictionary:
def invert_dict(d): return dict([ (v, k) for k, v in d.iteritems( ) ])
For large dictionaries, though, it’s faster to use the generator
izip
from the itertools
module in the Python Standard
Library:
from itertools import izip def invert_dict_fast(d): return dict(izip(d.itervalues( ), d.iterkeys( )))
Discussion
If the values in dict d
are not
unique, then d
cannot truly be inverted,
meaning that there exists no dict id
such that for any valid key
k
, id[d[k]]==k
. However, the functions shown in
this recipe still construct, even in such cases, a “pseudo-inverse”
dict pd
such that, for any
v
that is a value in
d
, d[pd[v]]==v
. Given the original dict
d
and the dict x
returned by either of the functions shown in this recipe, you can
easily check whether x
is the true inverse
of d
or just d
’s
pseudo-inverse: x
is the true inverse of
d
if and only if len(x)==len(d)
. That’s because, if two
different keys have the same value, then, in the result of either of
the functions in this recipe, one of the two keys will simply go
“poof” into the ether, thus leaving the resulting pseudo-inverse dict
shorter than the dict you started with. In any case, quite obviously,
the functions shown in this recipe can work only if all values in
d
are hashable (meaning that they are all
usable as keys into a dict): otherwise, the functions raise a
TypeError
exception.
When we program in Python, we normally “disregard minor optimizations,” as Donald Knuth suggested over thirty years ago: we place a premium on clarity and correctness and care relatively little about speed. However, it can’t hurt to know about faster possibilities: when we decide to code in a certain way because it’s simpler or clearer than another, it’s best if we are taking the decision deliberately, not out of ignorance.
Here, function invert_dict
in this recipe’s
Solution might perhaps be considered clearer because it shows exactly
what it’s doing. Take the pairs k
,
v
of key and value that method iteritems
yields, swap them into (value, key)
order, and feed the resulting
list as the argument of dict
, so
that dict
builds a dictionary where
each value v
is a key and the corresponding
key k
becomes that key’s value—just the
inverse dict
that our problem
requires.
However, function invert_dict_fast
, also in
this recipe’s Solution, isn’t really any more complicated: it just
operates more abstractly, by getting all keys and all values as two
separate iterators and zipping them up (into an iterator whose items
are the needed, swapped (value,
key)
pairs) via a call to generator izip
, supplied by the itertools
module of the Python Standard
Library. If you get used to such higher abstraction levels, they will
soon come to feel simpler than lower-level
code!
Thanks to the higher level of abstraction, and to never
materializing the whole list of pairs (but rather operating via
generators and iterators that yield only one item at a time), function
invert_dict_fast
can be substantially faster than
function invert_dict
. For example, on my machine, to
invert a 10,000-item dictionary, invert_dict
takes
about 63 milliseconds, but invert_dict_fast
manages
the same task in just 20 milliseconds. A speed increase by a factor of
three, in general, is not to be sneered at. Such performance gains,
when you work on large amounts of data, are the norm, rather than the
exception, for coding at higher abstraction levels. This is
particularly true when you can use itertools
rather than loops or list
comprehensions, because you don’t need to materialize some large list
in memory at one time. Performance gain is an extra incentive for
getting familiar with working at higher abstraction levels, a
familiarity that has conceptual and productivity pluses, too.
See Also
Documentation on mapping types and itertools
in the Library
Reference and Python in a Nutshell;
Chapter 19.
4.15. Associating Multiple Values with Each Key in a Dictionary
Credit: Michael Chermside
Solution
By nature, a dictionary is a one-to-one mapping, but it’s not
hard to make it one-to-many—in other words, to make one key map to
multiple values. Your choice of one of two possible approaches depends
on how you want to treat duplications in the set of values for a key.
The following approach, based on using list
s as the dict
’s values, allows such
duplications:
d1 = { } d.setdefault(key, [ ]).append(value)
while an alternative approach, based on using sub-dict
s as the dict
’s values, automatically eliminates
duplications of values:
d2 = { } d2.setdefault(key, { })[value] = 1
In Python 2.4, the no-duplication approach can equivalently be coded:
d3 = { } d3.setdefault(key, set( )).add(value)
Discussion
A normal dictionary performs a simple mapping of each key to one
value. This recipe shows three easy, efficient ways to achieve a
mapping of each key to multiple values, by holding as the dictionary’s
values list
s, sub-dict
s, or, in Python 2.4, set
s. The semantics of the list-based
approach differ slightly but importantly from those of the other two
in terms of how they deal with duplication. Each approach relies on
the setdefault
method of a
dictionary, covered earlier in Recipe 4.10, to initialize
the entry for a key in the dictionary, if needed, and in any case to
return said entry.
You need to be able to do more than just add values for a key. With the first approach, which uses lists and allows duplications, here’s how to retrieve the list of values for a key:
list_of_values = d1[key]
Here’s how to remove one value for a key, if you don’t mind
leaving empty lists as items of d1
when the last value for a key is removed:
d1[key].remove(value)
Despite the empty lists, it’s still easy to test for the existence of a key with at least one value—just use a function that always returns a list (maybe an empty one), such as:
def get_values_if_any(d, key): return d.get(key, [ ])
For example, to check whether 'freep'
is among the values (if any) for key
'somekey'
in dictionary
d1
, you can code: if 'freep' in get_values_if_any(d1,
'somekey')
.
The second approach, which uses sub-dict
s and eliminates duplications, can use
rather similar idioms. To retrieve the list of values for a
key:
list_of_values = list(d2[key])
To remove one value for a key, leaving empty dictionaries as
items of d2
when the last value for a key
is removed:
del d2[key][value]
In the third approach, showing the Python 2.4-only version
d3
, which uses set
s, this would be:
d3[key].remove(value)
One possibility for the get_values_if_any
function in either the
second or third (duplication-removing) approaches would be:
def get_values_if_any(d, key): return list(d.get(key, ( )))
This recipe focuses on how to code the raw functionality, but,
to use this functionality in a systematic way, you’ll probably want to
wrap up this code into a class. For that purpose, you need to make
some of the design decisions that this recipe highlights. Do you want
a value to be in the entry for a key multiple times? (Is the entry for
each key a bag rather than a set, in mathematical terms?) If so,
should remove
just reduce the
number of occurrences by 1, or should it wipe out all of them? This is
just the beginning of the choices you have to make, and the right
choices depend on the specifics of your application.
See Also
Recipe 4.10;
the Library Reference and Python in
a Nutshell sections on mapping types; Recipe 18.8 for an
implementation of the bag
type.
4.16. Using a Dictionary to Dispatch Methods or Functions
Credit: Dick Wall
Problem
You need to execute different pieces of code depending
on the value of some control variable—the kind of problem that in some
other languages you might approach with a case
statement.
Solution
Object-oriented programming, thanks to its elegant
concept of dispatching, does away with many (but not all) needs for
case
statements. In Python,
dictionaries, and the fact that functions are first-class objects (in
particular, functions can be values in a dictionary), conspire to make
the full problem of “case statements” easier to solve. For example,
consider the following snippet of code:
animals = [ ] number_of_felines = 0 def deal_with_a_cat( ): global number_of_felines print "meow" animals.append('feline') number_of_felines += 1 def deal_with_a_dog( ): print "bark" animals.append('canine') def deal_with_a_bear( ): print "watch out for the *HUG*!" animals.append('ursine') tokenDict = { "cat": deal_with_a_cat, "dog": deal_with_a_dog, "bear": deal_with_a_bear, } # Simulate, say, some words read from a file words = ["cat", "bear", "cat", "dog"] for word in words: # Look up the function to call for each word, and call it return tokenDict[word]( ) nf = number_of_felines print 'we met %d feline%s' % (nf, 's'[nf==1:]) print 'the animals we met were:', ' '.join(animals)
Discussion
The key idea in this recipe is to construct a dictionary with
string (or other) values as keys, and bound-methods, functions, or
other callables as values. At each step of execution, we use the
string keys to select which callable to execute and then call it. This
approach can be used as a kind of generalized case
statement.
It’s embarrassingly simple (really!), but I use this technique often. You can also use bound-methods or other callables instead of functions. If you use unbound methods, you need to pass an appropriate object as the first actual argument when you do call them. More generally, you can store, as the dictionary’s values, tuples including both a callable and arguments to pass to the callable.
I primarily use this technique in places where in other
languages, I might want a case
,
switch
, or select
statement. For example, I use it to
implement a poor man’s way to parse command files
(e.g., an X10 macro control file).
See Also
The Library Reference section on mapping types; the Reference Manual section on bound and unbound methods; Python in a Nutshell about both dictionaries and callables.
4.17. Finding Unions and Intersections of Dictionaries
Credit: Tom Good, Andy McKay, Sami Hangaslammi, Robin Siebler
Problem
Given two dictionaries, you need to find the set of keys that are in both dictionaries (the intersection) or the set of keys that are in either dictionary (the union).
Solution
Sometimes, particularly in Python 2.3, you find yourself using
dictionaries as concrete representations of sets. In such cases, you
only care about the keys, not the corresponding values, and often you
build the dictionaries by calls to dict.fromkeys
, such as
a = dict.fromkeys(xrange(1000)) b = dict.fromkeys(xrange(500, 1500))
The fastest way to compute the dict
that is the set-union is:
union = dict(a, **b)
The fastest concise way to compute the dict
that is the set-intersection is:
inter = dict.fromkeys([x for x in a if x in b])
If the number of items in dictionaries
a
and b
can be
very different, then it can be important for speed considerations to
have the shorter one in the for
clause, and the longer one in the if
clause, of this list comprehension. In
such cases, it may be worth sacrificing some conciseness in favor of
speed, by coding the intersection computation as follows:
if len(a) < len(b): inter = dict.fromkeys([x for x in a if x not in b]) else: inter = dict.fromkeys([x for x in b if x not in a])
Python also gives you types to represent sets directly
(in standard library module sets
,
and, in Python 2.4, also as built-ins). Here is a snippet that you can
use at the start of a module: the snippet ensures that name set
is bound to the best available set type,
so that throughout the module, you can then use the same code whether
you’re using Python 2.3 or 2.4:
try: set except NameError: from sets import Set as set
Having done this, you can now use type set
to best effect, gaining clarity and
conciseness, and (in Python 2.4) gaining a little speed, too:
a = set(xrange(1000)) b = set(xrange(500, 1500)) union = a | b inter = a & b
Discussion
In Python 2.3, even though the Python Standard Library
module sets
offers an elegant data
type Set
that directly represents a
set (with hashable elements), it is still common to use a dict
to represent a set, partly for
historical reasons. Just in case you want to keep doing it, this
recipe shows you how to compute unions and intersections of such sets
in the fastest ways, which are not obvious. The code in this recipe,
on my machine, takes about 260 microseconds for the union, about 690
for the intersection (with Python 2.3; with Python 2.4, 260 and
600,respectively), while alternatives based on loops or generator
expressions are substantially slower.
However, it’s best to use type set
instead of representing sets by
dictionaries. As the recipe shows, using set
makes your code more direct and
readable. If you dislike the or-operator (|)
and the “and-operator” (&)
, you can equivalently use a.union(b)
and a.intersection(b)
, respectively. Besides
clarity, you also gain speed, particularly in Python 2.4: computing
the union still takes about 260 microseconds, but computing the
intersection takes only about 210. Even in Python 2.3, this approach
is acceptably fast: computing the union takes about 270 microseconds,
computing the intersection takes about 650—not quite as fast as Python
2.4 but still quite comparable to what you can get if you represent
sets by dictionaries. Last but not least, once you use type set
(whether it is the Python 2.4 built-in,
or class Set
from the Python
Standard Library module sets
, the
interface is the same), you gain a wealth of useful set operations.
For example, the set of elements that are in either
a
or b
but not
both is a^b
or, equivalently,
a.symmetric_difference(b)
.
Even if you start with dict
s
for other reasons, consider using set
s anyway if you need to perform set
operations. Say, for example, that you have in
phones
a dictionary that maps names to
phone numbers and in addresses
one that
maps names to addresses. The clearest and simplest way to print all
names for which you know both address and phone number, and their
associated data, is:
for name in set(phones) & set(addresses): print name, phones[name], addresses[name]
This is much terser, and arguably clearer, than something like:
for name in phones: if name in addresses: print name, phones[name], addresses[name]
Another excellent alternative is:
for name in set(phones).intersection(addresses): print name, phones[name], addresses[name]
If you use the named intersection
method, rather than the
&
intersection operator, you
don’t need to turn both dict
s into
set
s: just one of them. Then call
intersection
on the resulting
set
, and pass the other dict
as the argument to the intersection
method.
See Also
The Library Reference and
Python in a Nutshell sections on mapping types,
module sets
, and Python 2.4’s
built-in set
type.
4.18. Collecting a Bunch of Named Items
Credit: Alex Martelli, Doug Hudgeon
Problem
You want to collect a bunch of items together, naming each item of the bunch, and you find dictionary syntax a bit heavyweight for the purpose.
Solution
Any normal class instance inherently wraps a dictionary, which it uses to hold its state. We can easily take advantage of this handily wrapped dictionary by coding a nearly empty class:
class Bunch(object): def _ _init_ _(self, **kwds): self._ _dict_ _.update(kwds)
Now, to group a few variables, create a Bunch
instance:
point = Bunch(datum=y, squared=y*y, coord=x)
You can now access and rebind the named attributes just created, add others, remove some, and so on. For example:
if point.squared > threshold: point.isok = True
Discussion
We often just want to collect a bunch of stuff together, naming each item of the bunch. A dictionary is OK for this purpose, but a small do-nothing class is even handier and prettier to use.
It takes minimal effort to build a little class, as in this
recipe, to provide elegant attribute-access syntax. While a dictionary
is fine for collecting a few items in which each item has a name (the
item’s key in the dictionary can be thought of as the item’s name, in
this context), it’s not the best solution when all names are
identifiers, to be used just like variables. In class
Bunch
’s _ _init_ _
method, we accept arbitrary named arguments with the **kwds
syntax, and we use the
kwds
dictionary to update the initially empty
instance dictionary, so that each named argument gets turned into an
attribute of the instance.
Compared to attribute-access syntax, dictionary-indexing syntax
is not quite as terse and readable. For example, if
point
was a dictionary, the little snippet
at the end of the “Solution” would have to be coded like:
if point['squared'] > threshold: point['isok'] = True
An alternative implementation that’s just as attractive as the one used in this recipe is:
class EvenSimplerBunch(object): def _ _init_ _(self, **kwds): self._ _dict_ _ = kwds
Rebinding an instance’s dictionary may feel risqué, but it’s not
actually any pushier than calling that dictionary’s update
method. So you might prefer the
marginal speed advantage of this alternative implementation of
Bunch
. Unfortunately, I cannot find anywhere in
Python’s documentation an assurance that usage like:
d = {'foo': 'bar'} x = EvenSimplerBunch(**d)
will forever keep making x._ _dict_ _
an
independent copy of d
rather than just sharing a
reference. It does currently, and in every version, but unless it’s a
documented semantic constraint, we cannot be entirely sure that it
will keep working forever. So, if you do choose the implementation in
EvenSimplerBunch
, you might choose to assign a copy
(dict(kwds)
or kwds.copy( )
) rather than
kwds
itself. And, if you do, then the marginal speed
advantage disappears. All in all, the Bunch
presented
in this recipe’s Solution is probably preferable.
A further tempting but not fully sound alternative is to have
the Bunch
class inherit from
dict
, and set attribute access
special methods equal to the item access special methods, as
follows:
class DictBunch(dict): _ _getattr_ _ = dict._ _getitem_ _ _ _setattr_ _ = dict._ _setitem_ _ _ _delattr_ _ = dict._ _delitem_ _
One problem with this approach is that, with this definition, an
instance x
of DictBunch
has many attributes it doesn’t really have,
because it inherits all the attributes (methods, actually, but there’s
no significant difference in this context) of dict
. So, you can’t meaningfully check
hasattr(x, someattr)
, as you could
with the classes Bunch
and
EvenSimplerBunch
previously shown, unless you can
somehow rule out the value of someattr
being any of
several common words such as 'keys
', 'pop
', and 'get
‘.
Python’s distinction between attributes and items is really a wellspring of clarity and simplicity. Unfortunately, many newcomers to Python wrongly believe that it would be better to confuse items with attributes, generally because of previous experience with JavaScript and other such languages, in which attributes and items are regularly confused. But educating newcomers is a much better idea than promoting item/attribute confusion.
See Also
The Python Tutorial
section on classes; the Language Reference and
Python in a Nutshell coverage of classes; Chapter 6 for more information
about object-oriented programming in Python; Recipe 4.18 for more on the
**kwds
syntax.
4.19. Assigning and Testing with One Statement
Credit: Alex Martelli, Martin Miller
Problem
You are transliterating C or Perl code to Python, and to
keep close to the original’s structure, you’d like an expression’s
result to be both assigned and tested (as in if((x=foo( ))
or while((x=foo( ))
in such other
languages).
Solution
In Python, you can’t code if
x=foo()
: . . . . Assignment is a statement, so it cannot fit
into an expression, and you can only use expressions as conditions of
if
and while
statements. This isn’t a problem, it
just means you have to structure your code Pythonically! For example,
to process a file object f
line by line,
instead of the following C-like (and syntactically incorrect, in
Python) approach:
while (line=f.readline( )) != '': process(line)
you can code a highly Pythonic (readable, clean, fast) approach:
for line in f: process(line)
But sometimes, you’re transliterating from C, Perl, or another language, and you’d like your transliteration to be structurally close to the original. One simple utility class makes it easy:
class DataHolder(object): def _ _init_ _(self, value=None): self.value = value def set(self, value): self.value = value return value def get(self): return self.value # optional and strongly discouraged, but nevertheless handy at times: import _ _builtin_ _ _ _builtin_ _.DataHolder = DataHolder _ _builtin_ _.data = data = DataHolder( )
With the help of the DataHolder
class and its
instance data
, you can keep your C-like code
structure intact in transliteration:
while data.set(file.readline( )) != '': process(data.get( ))
Discussion
In Python, assignment is a statement, not an expression. Thus,
you cannot assign the result that you are also testing, for example,
in the condition of an if
, elif
, or while
statement. This is usually fine: just
structure your code to avoid the need to assign while testing (in
fact, your code will often become clearer as a result). In particular,
whenever you feel the need to assign-and-test within the condition of
a while
loop, that’s a good hint
that your loop’s structure probably wants to be refactored into a
generator (or other iterator). Once you have refactored in this way,
your loops become plain and simple for
statements. The example given in the
recipe, looping over each line read from a text file, is one where the
refactoring has already been done on your behalf by Python itself,
since a file
object is an iterator
whose items are the file’s lines.
However, sometimes you may be writing Python code that is the transliteration of code originally written in C, Perl, or some other language that supports assignment-as-expression. Such transliterations often occur in the first Python version of an algorithm for which a reference implementation is supplied, an algorithm taken from a book, and so on. In such cases, it’s often preferable to have the structure of your initial transliteration be close to that of the code you’re transcribing. You can refactor later and make your code more Pythonic—clearer, faster, and so on. But first, you want to get working code as soon as possible, and specifically you want code that is easy to check for compliance to the original it has been transliterated from. Fortunately, Python offers enough power to make it quite easy for you to satisfy this requirement.
Python doesn’t let us redefine the meaning of assignment, but we
can have a method (or function) that saves its argument
somewhere and also returns that argument so it
can be tested. That somewhere is most naturally
an attribute of an object, so a method is a more natural choice than a
function. Of course, we could just retrieve the attribute directly
(i.e., the get
method is redundant), but it looks
nicer to me to have symmetry between data.set
and
data.get
.
data.set(whatever)
can be
seen as little more than syntactic sugar around data.value=whatever
, with the added value of
being acceptable as an expression. Therefore, it’s the one obviously
right way to satisfy the requirement for a reasonably faithful
transliteration. The only difference between the resulting Python code
and the original (say) C or Perl code, is at the syntactic sugar
level—the overall structure is the same, and that’s the key
issue.
Importing _ _builtin_ _
and
assigning to its attributes is a trick that basically defines a new
built-in object at runtime. You can use that trick in your
application’s start-up code, and then all other modules will
automatically be able to access your new built-ins without having to
do an import
. It’s
not good Python practice, though; on the
contrary, it’s pushing the boundaries of Pythonic good taste, since
the readers of all those other modules should not have to know about
the strange side effects performed in your application’s startup code.
But since this recipe is meant to offer a quick-and-dirty approach for
a first transliteration that will soon be refactored to make it
better, it may be acceptable in this specific context to cut more
corners than one would in production-level code.
On the other hand, one trick you should definitely not use is the following abuse of a currently existing wart in list comprehensions:
while [line for line in [f.readline( )] if line!='']: process(line)
This trick currently works, since both Python 2.3 and 2.4 still
“leak” the list comprehension control variable (here,
line
) into the surrounding scope. However,
besides being obscure and unreadable, this trick is specifically
deprecated: list comprehension control variable leakage
will be fixed in some future version of Python,
and this trick will then stop working at all.
See Also
The Tutorial section on classes; the
documentation for the _ _builtin_ _
module in the Library Reference and
Python in a Nutshell; Language
Reference and Python in a Nutshell
documentation on list comprehensions.
4.20. Using printf in Python
Credit: Tobias Klausmann, Andrea Cavalcanti
Problem
You’d like to output something to your program’s
standard output with C’s function printf
, but Python doesn’t have that
function.
Solution
It’s easy to code a printf
function in Python:
import sys def printf(format, *args): sys.stdout.write(format % args)
Discussion
Python separates the concepts of output (the print
statement) and formatting (the
%
operator), but if you prefer to
have these concepts together, they’re easy to join, as this recipe
shows. No more worries about automatic insertion of spaces or
newlines, either. Now you need worry only about correctly matching
format and arguments!
For example, instead of something like:
print 'Result tuple is: %r' % (result_tuple,),
with its finicky need for commas in unobvious places (i.e., one
to make a singleton tuple around
result_tuple
, one to avoid the newline that
print
would otherwise insert by
default), once you have defined this recipe’s printf
function, you can just write:
printf('Result tuple is: %r', result_tuple)
See Also
Library Reference and Python
in a Nutshell documentation for module sys
and for the string formatting operator
%
; Recipe 2.13 for a way to
implement C++’s <<
-style
output in Python.
4.21. Randomly Picking Items with Given Probabilities
Credit: Kevin Parks, Peter Cogolo
Problem
You want to pick an item at random from a list, just about as
random.choice
does, but you need to
pick the various items with different probabilities given in another
list, rather than picking any item with equal probability as random.choice
does.
Solution
Module random
in the
standard Python library offers a wealth of possibilities for
generating and using pseudo-random numbers, but it does not offer this
specific functionality, so we must code it as a function of our
own:
import random def random_pick(some_list, probabilities): x = random.uniform(0, 1) cumulative_probability = 0.0 for item, item_probability in zip(some_list, probabilities): cumulative_probability += item_probability if x < cumulative_probability: break return item
Discussion
Module random
in the standard
Python library does not have the weighted choice
functionality that is sometimes needed in games, simulations, and
random tests, so I wrote this recipe to supply this functionality. The
recipe uses module random
’s
function uniform
to get a uniformly
distributed pseudo-random number between 0.0 and 1.0, then loops in
parallel on items and their probabilities, computing the increasing
cumulative probability, until the latter becomes greater than the
pseudo-random number.
The recipe assumes, but does not check, that
probabilities
is a sequence with just as many items
as some_list
, which are probabilities—that is,
numbers between 0.0 and 1.0, summing up to 1.0; if these assumptions
are violated, you may still get some random picks, but they will not
follow the (inconsistent) specifications encoded in the function’s
arguments. You may want to add some assert
statements at the start of the
function to check that the arguments make sense, such as:
assert len(some_list) == len(probabilities) assert 0 <= min(probabilities) and max(probabilities) <= 1 assert abs(sum(probabilities)-1.0) < 1.0e-5
However, these checks can be quite time consuming, so I don’t normally use them and have not included them in the official Solution.
As I already mentioned, the problem solved in this recipe
requires items to be associated with
probabilities—numbers between 0 and 1, summing up
to 1. A related but slightly different task is to get random picks
with weighted relative probabilities given by small non-negative
integers—odds, rather than probabilities. For
this related problem, the best solution is a generator, with an
internal structure that is rather different from the function
random_pick
given in this recipe’s Solution:
import random def random_picks(sequence, relative_odds): table = [ z for x, y in zip(sequence, relative_odds) for z in [x]*y ] while True: yield random.choice(table)
This generator works by first preparing a table whose total
number of items is sum(relative_odds)
, each item of
seq
appearing in the table as many times as the small
non-negative integer that is its corresponding item in
relative_odds
. Once the table is prepared, the
generator’s body is tiny and fast, as it simply delegates to random.choice
the picking of each random
item it yields. Typical uses of this random_picks
generator might be:
>>> x = random_picks('ciao', [1, 1, 3, 2]) >>> for two_chars in zip('boo', x): print ''.join(two_chars),bc oa oa
>>> import itertools >>> print ''.join(itertools.islice(x, 8))icacaoco
See Also
Module random
in the
Library Reference and Python in a
Nutshell.
4.22. Handling Exceptions Within an Expression
Credit: Chris Perkins, Gregor Rayman, Scott David Daniels
Problem
You want to code an expression, so
you can’t directly use the statement try
/except
, but you still need to handle
exceptions that the expression may throw.
Solution
To catch exceptions, try
/except
is indispensable, and, since try
/except
is a statement,
the only way to use it inside an expression is to
code an auxiliary function:
def throws(t, f, *a, **k): '''Return True iff f(*a, **k) raises an exception whose type is t (or, one of the items of _tuple_ t, if t is a tuple).''' try: f(*a, **k) except t: return True else: return False
For example, suppose you have a text file, which has one number per line, but also extra lines which may be whitespace, comments, or what-have-you. Here is how you can make a list of all the numbers in the file, skipping the lines that aren’t numbers:
data = [float(line) for line in open(some_file) if not throws(ValueError, float, line)]
Discussion
You might prefer to name such a function
raises
, but I personally prefer
throws
, which is probably a throwback to C++. By
whatever name, the auxiliary function shown in this recipe takes as
its arguments, first an exception type (or tuple of exception types)
t
, then a callable f
, and then
arbitrary positional and named arguments a
and
k
, which are to be passed on to f
.
Do not code, for example, if not
throws(ValueError, float(line))
! When you
call a function, Python evaluates the arguments before passing control
to the function; if an argument’s evaluation raises an exception, the
function never even gets started. I’ve seen this erroneous usage
attempted more than once by people who are just starting to use the
assertRaises
method from the
standard Python library’s unittest.TestCase
class, for example.
When throws
executes, it just calls
f
within the try
clause of a try
/except
statement, passing on the arbitrary
positional and named arguments. If the call to f
in
the try
clause raises an exception
whose type is t
(or one of the items of
t
, if t
is a tuple
of exception types), then control
passes to the corresponding except
clause, which, in this case, returns True
as throws
' result. If
no exception is raised in the try
clause, then control passes to the corresponding else
clause (if any), which, in this case,
returns False
as
throws
' result.
Note that, if some unexpected exception
(one whose type is not in t
) gets raised, then
function throws
does not catch that exception, so
that throws
terminates and propagates the exception
to its caller. This choice is quite a deliberate one. Catching
exceptions with a too-wide except
clause is a bug-diagnosing headache waiting to happen. If the caller
really wants throws
to catch just about everything,
it can always call throws(Exception
, . . .—and live with the
resulting headaches.
One problem with the throws
function is that
you end up doing the key operation twice—once just to see if it
throws, tossing the result away, then, a second time, to get the
result. It would be nicer to get the result, if any, together with an
indication of whether an exception has been caught. I first tried
something along the lines of:
def throws(t, f, *a, **k): " Return a pair (True, None) if f(*a, **k) raises an exception whose type is in t, else a pair (False, x) where x is the result of f(*a, **k). " try: return False, f(*a, **k) except t: return True, None
Unfortunately, this version doesn’t fit in well in a list
comprehension: there is no elegant way to get and use both the flag
and the result. So, I chose a different approach: a function that
returns a list
in any case—empty if
an exception was caught, otherwise with the result as the only item.
This approach works fine in a list comprehension, but for clarity, the
name of the function needs to be changed:
def returns(t, f, *a, **k): " Return [f(*a, **k)] normally, [ ] if that raises an exception in t. " try: return [ f(*a, **k) ] except t: return [ ]
The resulting list comprehension is even more elegant, in my opinion, than the original one in this recipe’s Solution:
data = [ x for line in open(some_file) for x in returns(ValueError, float, line) ]
See Also
Python in a Nutshell’s section on
catching and handling exceptions; the sidebar The *args and **kwds
Syntax for an explanation of *args
and **kwds
syntax.
4.23. Ensuring a Name Is Defined in a Given Module
Credit: Steven Cummings
Problem
You want to ensure that a certain name is defined in a
given module (e.g., you want to ensure that there is a built-in name
set
), and, if not, you want to
execute some code that sets the definition.
Solution
The solution to this problem is the only good use I’ve
yet seen for statement exec
.
exec
lets us execute arbitrary
Python code from a string, and thus lets us write a very simple
function to deal with this task:
import _ _builtin_ _ def ensureDefined(name, defining_code, target=_ _builtin_ _): if not hasattr(target, name): d = { } exec defining_code in d assert name in d, 'Code %r did not set name %r' % ( defining_code, name) setattr(target, name, d[name])
Discussion
If your code supports several versions of Python (or of some
third-party package), then many of your modules must start with code
such as the following snippet (which ensures name set
is properly set in either Python 2.4,
where it’s a built-in, or 2.3, where it must be obtained from the
standard library):
try: set except NameError: from sets import Set as set
This recipe encapsulates this kind of logic directly, and by
default works on module _ _builtin_
_
, since that’s the typical module for which you need to
work around missing names in older Python versions. With this recipe,
you could ensure name set
is
properly defined among the built-ins by running just once, during your
program’s initialization, the single call:
ensureDefined('set', 'from sets import Set as set')
The key advantage of this recipe is that you can group all
needed calls to ensureDefined
in just one place of
your application, at initialization time, rather than having several
ad hoc try
/except
statements at the start of various
modules. Moreover, ensureDefined
may allow more
readable code because it does only one specific job, so the purpose of
calling it is obvious, while try
/except
statements could have several
purposes, so that more study and reflection might be needed to
understand them. Last but not least, using this recipe lets you avoid
the warnings that the try
/except
approach can trigger from such useful
checking tools as pychecker, http://pychecker.sourceforge.net/. (If you
aren’t using pychecker or something like that, you should!)
The recipe takes care to avoid unintended accidental side
effects on target
, by using an auxiliary
dictionary d
as the target for the exec
statement and then transferring only
the requested name. This way, for example, you can use as
target
an object that is not a module (a
class, say, or even a class instance), without necessarily adding to
your target an attribute named _ _builtins_
_
that references the dictionary of Python’s built-ins. If
you used less care, so that the body of the if
statement was only:
exec defining_code in vars(target)
you would inevitably get such side effects, as documented at http://www.python.org/doc/current/ref/exec.html.
It’s important to be aware that exec
can and does execute any valid string
of Python code that you give it. Therefore, make sure that the
argument defining_code
that you pass to any call of
function ensureDefined
does not
come from an untrusted source, such as a text file that might have
been maliciously tampered with.
See Also
The online documentation of the exec
statement in the Python
Language Reference Manual at http://www.python.org/doc/current/ref/exec.html.
Get Python Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.