Credit: Paul Moore
You have a pure function that is often called with the same arguments (particularly a recursive function) and is slow to compute its results, and you are looking for a simple way to gain substantial performance.
The key idea behind memoizing is to store a function’s results in a dictionary, keyed by the arguments that produce each result. Of course, this makes sense only for a pure function (i.e., one that yields the same result when called repeatedly with given arguments). It’s easy to memoize a function by hand. For example, using the recursive Fibonacci function:
fib_memo = {} def fib(n): if n < 2: return 1 if not fib_memo.has_key(n): fib_memo[n] = fib(n-1) + fib(n-2) return fib_memo[n]
Having to code the memoization inside each function to be memoized, however, is repetitive and interferes with the function’s readability. A good alternative is to encapsulate the memoization mechanics into a class:
class Memoize: def _ _init_ _(self, fn): self.fn = fn self.memo = {} self.cacheable = self.misses = self.noncacheable = 0L def _ _call_ _(self, *args, **kwds): if not kwds: self.cacheable += 1 try: return self.memo[args] except KeyError: self.misses += 1 self.memo[args] = self.fn(*args) return self.memo[args] except TypeError: self.cacheable -= 1 self.noncacheable += 1 return self.fn(*args, **kwds)
Using this class to memoize fib
, the function
definition becomes obvious without caching boilerplate to obscure the
algorithm. However, you must assign the Memoize
instance to the same name, fib
, as the recursive
function. Otherwise, the recursive calls bypass the memoizing:
def fib(n): if n < 2: return 1 return fib(n-1) + fib(n-2) fib = Memoize(fib)
For functions that take mutable arguments, you can sometimes use the
cPickle
module to memoize them anyway:
class MemoizeMutable: def _ _init_ _(self, fn): self.fn = fn self.memo = {} def _ _call_ _(self, *args, **kwds): import cPickle str = cPickle.dumps(args, 1)+cPickle.dumps(kwds, 1) if not self.memo.has_key(str): self.memo[str] = self.fn(*args, **kwds) return self.memo[str]
The Memoize
class is
instantiated with one argument, a function f
, and
returns an instance that acts like f
but memoizes
its arguments and result if the actual arguments to a call are
hashable (nonmutable) and positional. Calls with mutable or keyword
arguments are counted in the x.noncacheable
instance attribute, cacheable calls are counted in the
x.cacheable
instance attribute, and cache misses
in cacheable calls are counted in the x.misses
attribute. Unless x.misses
and
x.noncacheable
are low compared to
x.cacheable
, you’re better off
not memoizing the function. So do a few dry runs that are
representative of your intended production usage to examine these
statistics and decide if memoization is worth using for your specific
application.
As we’ve already noted in the
recipe’s example of the Memoize
class, it is important that the value of fib
is
replaced by the memoized version. Storing the memoized version
elsewhere, as in memoized_fib = Memoize(fib)
, will
not work, because the recursive calls will then call
fib
directly, bypassing the cache. This is an
issue only for recursive functions, but since recursive functions are
prime candidates for memoizing, it’s worth keeping
in mind.
Obviously,
functions to be memoized must
be pure (i.e., they must have no side effects and must always return
the same value whenever they are called with the same set of
arguments). More significantly, the Memoize
class
(and the inline version above) does not memoize calls that receive
mutable arguments, such as len
on a list. (Note
that you cannot memoize functions that change their mutable arguments
because they are not pure functions).
MemoizeMutable
weakens this constraint a bit and also
accepts named arguments, as long as all arguments can be handled by
the cPickle
module (most types can, but by no
means all).
Memoize
and friends cannot really check the
semantics of the functions you wrap in them. In other words, the
notions of “same value” and
“same set of arguments” are
somewhat vaguely defined in many cases, so take care.
Memoize
does try to field occasional calls with
keyword and mutable arguments (with an interesting mix of checking
and try
/except
), but
performance will suffer unless such cases are occasional. As we
already noted, Memoize
also keeps counts of
cacheable calls, noncacheable calls, and misses. This is a bit of
overhead, so you may want to rip the count-keeping out for a lighter,
faster, simpler version of Memoize
once
you’ve taken all the measurements you need to
convince yourself that memoizing is a good choice for a given
function. MemoizeMutable
, just to highlight the
contrast, does not try to field nonpicklable arguments and uses
checking instead of try
/except
to detect cache misses (but if you have many hits and few misses,
try
/except
can be a bit
faster).
One possible enhancement, like for all caching approaches, would be
to use weak references (from the weakref
standard
module), rather than normal references, for the
self.memo
cache. This may weaken the cache by
reducing its hit rate, but as a plus, it avoids keeping objects alive
just because the cache is holding on to them. The trade-off crucially
depends on the specifics of your application, so be sure to consider
it carefully. For details on weak references, see http://python.sourceforge.net/peps/pep-0205.html.
A similar approach could be used to cache results from external processes (i.e., commands). However, in this case it becomes less feasible to ensure that the crucial characteristics of no side effects and the same results for a given set of arguments are in place, so issues naturally arise about more complex procedures, such as time-based cache expiration, LRU disciplines, and so on. All in all, including such extras in this recipe would complicate it enormously without offering substantial benefits for typical cases. The possibilities mentioned here should, however, be kept in mind for those complicated cases in which they might come in handy.
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.