This section discusses more advanced features, which you may prefer to skip on the first time through this chapter.
So far the arguments we have passed into functions have been
simple objects, such as strings, or structured objects, such as lists.
Python also lets us pass a function as an argument to another
function. Now we can abstract out the operation, and apply a
different operation on the same
data. As the following examples show, we can pass the
built-in function len()
or a
user-defined function last_letter()
as arguments to another function:
>>> sent = ['Take', 'care', 'of', 'the', 'sense', ',', 'and', 'the', ... 'sounds', 'will', 'take', 'care', 'of', 'themselves', '.'] >>> def extract_property(prop): ... return [prop(word) for word in sent] ... >>> extract_property(len) [4, 4, 2, 3, 5, 1, 3, 3, 6, 4, 4, 4, 2, 10, 1] >>> def last_letter(word): ... return word[-1] >>> extract_property(last_letter) ['e', 'e', 'f', 'e', 'e', ',', 'd', 'e', 's', 'l', 'e', 'e', 'f', 's', '.']
The objects len
and last_letter
can be passed around like lists
and dictionaries. Notice that parentheses are used after a function
name only if we are invoking the function; when we are simply treating
the function as an object, these are omitted.
Python provides us with one more way to define functions as
arguments to other functions, so-called lambda
expressions. Supposing there was no need to use the
last_letter()
function in multiple
places, and thus no need to give it a name. Let’s suppose we can
equivalently write the following:
>>> extract_property(lambda w: w[-1]) ['e', 'e', 'f', 'e', 'e', ',', 'd', 'e', 's', 'l', 'e', 'e', 'f', 's', '.']
Our next example illustrates passing a function to the sorted()
function. When we call the latter
with a single argument (the list to be sorted), it uses the built-in
comparison function cmp()
. However,
we can supply our own sort function, e.g., to sort by decreasing
length.
>>> sorted(sent) [',', '.', 'Take', 'and', 'care', 'care', 'of', 'of', 'sense', 'sounds', 'take', 'the', 'the', 'themselves', 'will'] >>> sorted(sent, cmp) [',', '.', 'Take', 'and', 'care', 'care', 'of', 'of', 'sense', 'sounds', 'take', 'the', 'the', 'themselves', 'will'] >>> sorted(sent, lambda x, y: cmp(len(y), len(x))) ['themselves', 'sounds', 'sense', 'Take', 'care', 'will', 'take', 'care', 'the', 'and', 'the', 'of', 'of', ',', '.']
These functions start by initializing some storage, and iterate
over input to build it up, before returning some final object (a large
structure or aggregated result). A standard way to do this is to
initialize an empty list, accumulate the material, then return the
list, as shown in function search1()
in Example 4-5.
Example 4-5. Accumulating output into a list.
def search1(substring, words): result = [] for word in words: if substring in word: result.append(word) return result def search2(substring, words): for word in words: if substring in word: yield word print "search1:" for item in search1('zz', nltk.corpus.brown.words()): print item print "search2:" for item in search2('zz', nltk.corpus.brown.words()): print item
The function search2()
is a
generator. The first time this function is called, it gets as far as
the yield
statement and pauses. The
calling program gets the first word and does any necessary processing.
Once the calling program is ready for another word, execution of the
function is continued from where it stopped, until the next time it
encounters a yield
statement. This
approach is typically more efficient, as the function only generates
the data as it is required by the calling program, and does not need
to allocate additional memory to store the output (see the earlier
discussion of generator expressions).
Here’s a more sophisticated example of a generator which
produces all permutations of a list of words. In order to force the
permutations()
function to generate
all its output, we wrap it with a call to list()
.
>>> def permutations(seq): ... if len(seq) <= 1: ... yield seq ... else: ... for perm in permutations(seq[1:]): ... for i in range(len(perm)+1): ... yield perm[:i] + seq[0:1] + perm[i:] ... >>> list(permutations(['police', 'fish', 'buffalo'])) [['police', 'fish', 'buffalo'], ['fish', 'police', 'buffalo'], ['fish', 'buffalo', 'police'], ['police', 'buffalo', 'fish'], ['buffalo', 'police', 'fish'], ['buffalo', 'fish', 'police']]
Note
The permutations
function
uses a technique called recursion, discussed later in Algorithm Design. The ability to generate
permutations of a set of words is useful for creating data to test a
grammar (Chapter 8).
Python provides some higher-order functions that are standard features of functional programming languages such as Haskell. We illustrate them here, alongside the equivalent expression using list comprehensions.
Let’s start by defining a function is_content_word()
which checks whether a
word is from the open class of content words. We use this function as
the first parameter of filter()
,
which applies the function to each item in the sequence contained in
its second parameter, and retains only the items for which the
function returns True
.
>>> def is_content_word(word): ... return word.lower() not in ['a', 'of', 'the', 'and', 'will', ',', '.'] >>> sent = ['Take', 'care', 'of', 'the', 'sense', ',', 'and', 'the', ... 'sounds', 'will', 'take', 'care', 'of', 'themselves', '.'] >>> filter(is_content_word, sent) ['Take', 'care', 'sense', 'sounds', 'take', 'care', 'themselves'] >>> [w for w in sent if is_content_word(w)] ['Take', 'care', 'sense', 'sounds', 'take', 'care', 'themselves']
Another higher-order function is map()
, which applies a function to every
item in a sequence. It is a general version of the extract_property()
function we saw earlier in this section. Here is a
simple way to find the average length of a sentence in the news
section of the Brown Corpus, followed by an equivalent version with
list comprehension calculation:
>>> lengths = map(len, nltk.corpus.brown.sents(categories='news')) >>> sum(lengths) / len(lengths) 21.7508111616 >>> lengths = [len(w) for w in nltk.corpus.brown.sents(categories='news'))] >>> sum(lengths) / len(lengths) 21.7508111616
In the previous examples, we specified a user-defined function
is_content_word()
and a built-in
function len()
. We can also provide
a lambda expression. Here’s a pair of equivalent examples that count
the number of vowels in each word.
>>> map(lambda w: len(filter(lambda c: c.lower() in "aeiou", w)), sent) [2, 2, 1, 1, 2, 0, 1, 1, 2, 1, 2, 2, 1, 3, 0] >>> [len([c for c in w if c.lower() in "aeiou"]) for w in sent] [2, 2, 1, 1, 2, 0, 1, 1, 2, 1, 2, 2, 1, 3, 0]
The solutions based on list comprehensions are usually more readable than the solutions based on higher-order functions, and we have favored the former approach throughout this book.
When there are a lot of parameters it is easy to get confused about the correct order. Instead we can refer to parameters by name, and even assign them a default value just in case one was not provided by the calling program. Now the parameters can be specified in any order, and can be omitted.
>>> def repeat(msg='<empty>', num=1): ... return msg * num >>> repeat(num=3) '<empty><empty><empty>' >>> repeat(msg='Alice') 'Alice' >>> repeat(num=5, msg='Alice') 'AliceAliceAliceAliceAlice'
These are called keyword arguments.
If we mix these two kinds of parameters, then we must ensure that the
unnamed parameters precede the named ones. It has to be this way,
since unnamed parameters are defined by position. We can define a
function that takes an arbitrary number of unnamed and named
parameters, and access them via an in-place list of arguments *args
and an “in-place dictionary” of
keyword arguments **kwargs
.
(Dictionaries will be presented in Mapping Words to Properties Using Python Dictionaries.)
>>> def generic(*args, **kwargs): ... print args ... print kwargs ... >>> generic(1, "African swallow", monty="python") (1, 'African swallow') {'monty': 'python'}
When *args
appears as a
function parameter, it actually corresponds to all the unnamed
parameters of the function. As another illustration of this aspect of
Python syntax, consider the zip()
function, which operates on a variable number of arguments. We’ll use
the variable name *song
to
demonstrate that there’s nothing special about the name *args
.
>>> song = [['four', 'calling', 'birds'], ... ['three', 'French', 'hens'], ... ['two', 'turtle', 'doves']] >>> zip(song[0], song[1], song[2]) [('four', 'three', 'two'), ('calling', 'French', 'turtle'), ('birds', 'hens', 'doves')] >>> zip(*song) [('four', 'three', 'two'), ('calling', 'French', 'turtle'), ('birds', 'hens', 'doves')]
It should be clear from this example that typing *song
is just a convenient shorthand, and
equivalent to typing out song[0], song[1],
song[2]
.
Here’s another example of the use of keyword arguments in a function definition, along with three equivalent ways to call the function:
>>> def freq_words(file, min=1, num=10): ... text = open(file).read() ... tokens = nltk.word_tokenize(text) ... freqdist = nltk.FreqDist(t for t in tokens if len(t) >= min) ... return freqdist.keys()[:num] >>> fw = freq_words('ch01.rst', 4, 10) >>> fw = freq_words('ch01.rst', min=4, num=10) >>> fw = freq_words('ch01.rst', num=10, min=4)
A side effect of having named arguments is that they permit
optionality. Thus we can leave out any arguments where we are happy
with the default value: freq_words('ch01.rst', min=4)
, freq_words('ch01.rst', 4)
. Another common
use of optional arguments is to permit a flag. Here’s a revised
version of the same function that reports its progress if a verbose
flag is set:
>>> def freq_words(file, min=1, num=10, verbose=False): ... freqdist = FreqDist() ... if verbose: print "Opening", file ... text = open(file).read() ... if verbose: print "Read in %d characters" % len(file) ... for word in nltk.word_tokenize(text): ... if len(word) >= min: ... freqdist.inc(word) ... if verbose and freqdist.N() % 100 == 0: print "." ... if verbose: print ... return freqdist.keys()[:num]
Caution!
Take care not to use a mutable object as the default value of a parameter. A series of calls to the function will use the same object, sometimes with bizarre results, as we will see in the discussion of debugging later.
Get Natural Language Processing with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.