# Fluent Python: The Power of Special Methods

You’ve used Python for years. Do you really know it? Be brave and skin that Python. In this cut, Luciano Ramalho probes deep into special methods and the Data Model. Become fluent with idiomatic Python.

## Fluent Python: The Data Model

If you learned another object-oriented language before Python, you may have found it strange to call `len(collection)`

instead of `collection.len()`

. This apparent oddity is the tip of an iceberg that, when properly explored, is the key to everything we call *Pythonic*. The iceberg is called the Python data model, and it describes the API that you can use to make your own objects play well with the most idiomatic language features.

To create pleasant, intuitive, and expressive libraries and APIs, you need to leverage the Python data model, so that your objects behave consistently with the built-in objects in the language.

## Special Methods

The data model defines a set of interfaces. Python is a dynamically typed language, so to implement an interface you just code methods with the expected names and signatures. You are not required to fully implement an interface if a partial implementation covers your use cases. The data model interfaces all use method names prefixed and suffixed with `__`

(two underscores), such as `__add__`

or `__len__`

. These are known as special methods, magic methods or *dunder* methods (after *double underscore*).

Your code will rarely invoke special methods. Instead, you’ll implement special methods that the Python interpreter will call when handling syntax for:

- Iteration
- Collections
- Attribute access
- Operator overloading
- Function and method invocation
- Object creation and destruction
- String representation and formatting
- Managed contexts (i.e., with blocks)

To see the data model in action, we’ll implement a subset of the sequence interface, which describes the behavior of strings, lists, tuples, arrays, and many other Python types. Our second example will cover operator overloading.

## A Pythonic Card Deck

To experiment with special methods, we’ll code a package to represent decks of playing cards. First, we’ll create a simple class to represent an individual card.

### The Card Class

A card will be a record with two data attributes and no methods. Python has a factory to make such simple classes: `collections.namedtuple`

.

import collections Card = collections.namedtuple('Card', ['rank', 'suit'])

Note that the `namedtuple`

factory takes two arguments: the name of the class to create and a sequence of attribute names.

The `Card`

class can then be instantiated as usual:

beer_card = Card('7', 'diamonds') beer_card

If we inspect the attributes of the `Card`

class, we see lots of special methods. Among them, `__repr__`

is the one that produces the string representation we just saw.

dir(Card)

The `__repr__`

method is invoked implicitly by the console and debugger, or explicitly by the `repr()`

built-in function.

The output of `repr()`

for a `Card`

instance is so explicit that you can clone a card by applying `eval()`

to its `repr()`

:

my_card = eval(repr(beer_card)) my_card == beer_card

We are now ready to code the class to represent a deck of cards.

### The FrenchDeck Class

I’ll call the class for a collection of cards `FrenchDeck`

, since that is the formal name of the set of 52 cards with 4 suits, used not only in France, but in most of the Western world. Its code is short but it does a lot, as you’ll soon see.

class FrenchDeck: ranks = [str(n) for n in range(2, 11)] + list('JQKA') suits = 'spades diamonds clubs hearts'.split() def __init__(self): self._cards = [Card(rank, suit) for suit in self.suits for rank in self.ranks] def __len__(self): return len(self._cards) def __getitem__(self, position): return self._cards[position]

The trickiest part is the use of a list comprehension in the initializer to build a list of cards by computing the Cartesian product of the lists of suits and ranks. The logic of that list comprehension is explained in Chapter 2 of Fluent Python, but right now we want to focus on the external behavior of the class, not its implementation, so please believe that `self._cards`

holds a list of 52 `Card`

instances.

deck = FrenchDeck() len(deck)

The `len`

built-in function knows how to handle a `FrenchDeck`

because we implemented the `__len__`

special method. This is consistent with how built-in collections work, and saves the user from memorizing arbitrary method names for common operations (“How to get the number of items? Is it `.length()`

, `.size()`

or what?”).

The `__getitem__`

special method supports the use of `[]`

and provides a lot of functionality.

We can get any card by index, as usual. For example, first and last:

deck[0], deck[-1]

We can also use slice notation to retrieve a subset of the cards:

deck[:3]

Here’s how to use the the third parameter of a slice to get just the aces in the deck by starting at card index 12 and skipping 13 cards from that point onwards.

deck[12::13]

The `in`

operator also works with our `FrenchDeck`

instances. This behavior can be optimized by implementing a `__contains__`

method, but if you provide a `__getitem__`

method, Python is smart enough to scan the collection from item 0 to the end.

Card('Q', 'hearts') in deck

Card('Z', 'clubs') in deck

### Iteration

Our card decks are also iterable. Implementing an `__iter__`

method to return a custom iterator is the optimal way to achieve this. But, as a fallback, Python knows how to iterate over any collection that implements `__getitem__`

and accepts integer indexes starting at 0:

for card in deck: print(card)

By supporting iteration, we can leverage many functions in the standard library that work with iterables, like `enumerate()`

, `reversed()`

, as well as the constructor for `list`

and several other collection types.

list(enumerate(reversed(deck), 1))

Another powerful function that works with iterables is `sorted`

. It builds a sorted list from iterables that generate a series of comparable values.

sorted(deck)

We can define custom sorting criteria by implementing a function to produce a key from each item in the series, and passing it as the `key=`

argument to `sorted`

. Here is a function that implements the “spades high” ordering, where cards are sorted by rank, and within each rank, spades is the highest suit, followed by hearts, diamonds, and clubs:

def spades_high(card): rank_value = FrenchDeck.ranks.index(card.rank) return (rank_value, card.suit)

As written, `spades_high`

produces the highest key value for the Ace of spades and the lowest for the 2 of clubs:

spades_high(Card('A', 'spades')), spades_high(Card('2', 'clubs'))

sorted(deck, key=spades_high)

### Shuffling and Item Assignment

The standard library provides many functions that operate on sequences. For example, picking a random item is as simple as this:

import random random.choice(deck)

This is a live notebook, so each time you run the code above, the random choice will be computed again, producing different results.

How about shuffling? Let’s try it next.

try: random.shuffle(deck) except TypeError as e: # this error is expected! print(repr(e)) else: print('The deck was shuffled!')

The first time you run this notebook you should see the exception above: `TypeError: 'FrenchDeck' object does not support item assignment`

. The problem is that `random.shuffle`

works by rearranging the items in place, but it can only do `deck[i] = card`

if the sequence implements the `__setitem__`

method.

#### Monkey Patching

We could redefine the whole `FrenchDeck`

class here, but let’s do a *monkey patch*, just for fun. Monkey patching is changing classes or modules at run time. To enable shuffling, we can create a function that puts a card in a certain position of a deck:

def put(deck, index, card): deck._cards[index] = card

Note how `put`

is tricky: it assigns to the “private” attribute `deck._cards`

. Monkey patches are naughty. They often touch the intimate parts of the target objects.

Now we can patch the `FrenchDeck`

card to insert the `put`

function as its `__setitem__`

method:

FrenchDeck.__setitem__ = put

Now we can shuffle the deck and get the first five cards to verify:

random.shuffle(deck) deck[:5]

Again, in a live notebook such as this, each time you run the cell above you should get a different result.

If you want to disable item assignment to experiment further, you can delete `__setitem__`

from the `FrenchDeck`

class. Then `random.shuffle`

will stop working. Uncomment and run the next cell to try this.

# del FrenchDeck.__setitem__ # random.shuffle(deck) # <-- this will break

Monkey patching has a bad reputation among Pythonistas. Monkey patches are often tightly bound to the implementation details of the patched code, so we apply them only as a last resort.

However, some important Python projects use this technique to great effect. For example, the `gevent`

networking library, uses monkey patching extensively to make the Python standard library support highly concurrent network I/O.

Here, monkey patching was a didactic device to illustrate these ideas:

- Classes are objects too, so you can add attributes to them at run-time
- Methods are merely functions assigned to class attributes
- What makes a method “special” is naming. Python recognizes a fixed set of special method names, such as
`__setitem__`

- Many standard operations are implemented by special methods. For example, getting and setting items in sequences triggers the
`__getitem__`

and`__setitem__`

methods

Most of the special method names supported by Python are in the data model chapter of the Python Language Reference.

### What We Saw So Far

The `FrenchDeck`

example demonstrates how smart use of Python features lets us go very far with just a little coding.

`FrenchDeck`

implicitly inherits from `object`

. All its interesting behaviors are not inherited, but come from leveraging the data model and composition. The `__len__`

and `__getitem__`

methods delegate all the work to a `list`

object, `self._cards`

. Note that my code never calls special methods directly. They are called by the interpreter.

By implementing the special methods `__len__`

and `__getitem__`

, our class behaves like a basic Python sequence, allowing it to benefit from core language features such as:

- The
`len()`

built-in function - Item access
- Slicing
- Iteration
- Many functions that accept sequences or iterables (e.g.
`list`

,`enumerate`

,`sorted`

,`random.choice`

)

By adding `__setitem__`

, the deck became mutable, thus supporting `random.shuffle`

. When I first wrote this example years ago I actually did implement a `FrenchDeck.shuffle`

method. But then I realized that I was already coding a sequence-like object, so I should just use the existing sequence shuffling function in the standard library.

The main point is this: if you follow the conventions of the data model, your users can take more advantage of the standard library. So your objects will be easier to use and more powerful at the same time.

The next example will show how special methods are used for operator overloading.

## Overloading Operators for Vector Arithmetic

Programmers often think of a vector as a synonym for array, but let’s consider Euclidean vectors used in math and physics, like these:

The picture above illustrates a vector addition. We’d like to represent those objects in code, like this:

v1 = Vector([2, 4]) v2 = Vector([2, 1]) v3 = v1 + v2 print(v3) # --> Vector([4, 5])

Let’s start with the basic methods every object should have: `__init__`

and `__repr__`

.

Please note this is a didactic example only!In practice, to do numerical computations with vectors and matrices, you should use NumPy and other packages from the SciPy collection.

### Vector Take #1: Initialization and Inspection

Our first step enables building and inspecting instances of `vector`

. We’ll use an `array`

of doubles to store the arguments passed to the constructor:

from array import array class Vector: def __init__(self, components): self._components = array('d', components) def __repr__(self): components_str = ', '.join(str(x) for x in self._components) return '{}([{}])'.format(self.__class__.__name__, components_str)

Like the standard Python console and debugger, iPython uses `repr(v1)`

to render a `v1`

object, triggering a call to `v1.__repr__()`

:

v1 = Vector([2, 4, 6]) v1

The first line of the `__repr__`

method uses a generator expression to iterate over the array of floats to render each as a string, then joins them with commas. The second line builds a string with the name of the class (eg. `'Vector()'`

) and the components string inside the parenthesis, emulating the syntax of a constructor call with a `list`

argument.

Insight: why`repr`

and`__repr__`

,`len`

and`__len__`

?You may be wondering why we say that Python calls

`repr`

but we implement`__repr__`

. With the`FrenchDeck`

it was the same thing: we saw that`len(deck)`

resulted in a call to`deck.__len__()`

.There is a practical reason: for built-in types, a call such as

`len(obj)`

does not invoke`obj.__len__()`

. If the type of`obj`

is a variable length built-in type coded in C, its memory representation has a struct named`PyVarObject`

with an`ob_size`

field. In that case,`len(obj)`

just returns the value of the`ob_size`

field, avoiding an expensive dynamic attribute lookup and method call. Only if`obj`

is a user defined type, then`len()`

will call the`__len__()`

special method, as a fallback.A similar rationale explains why Java arrays have a

`.length`

attribute, while most Java collections implement`.length()`

or`.size()`

methods. The difference is that Python strives for consistency; it optimizes the operation of the fundamental built-in types, but allows our own types to behave consistently by implementing the special methods defined in the data model.

### Vector Take #2: Iteration

As mentioned, the recommended way to make an iterable object is to implement an `__iter__`

method that returns an iterator.

from array import array import math class Vector: def __init__(self, components): self._components = array('d', components) def __repr__(self): components_str = ', '.join(str(x) for x in self._components) return '{}([{}])'.format(self.__class__.__name__, components_str) def __iter__(self): # <- - - - - - - - - - - new method return iter(self._components)

Here I just call the `iter()`

built-in on the `_components`

array to get an iterator that will return each component of the vector. As a result, here are some things we can do with `vector`

instances now:

v3 = Vector([10, 20, 30]) x, y, z = v3 # tuple unpacking x, y, z

list(v3), set(v3) # seed collection constructors

for label, value in zip('xyz', v3): # parallel iteration with zip print(label, '=', value)

By the way, did you know that the `zip`

built-in function has nothing to do with compressed files?

The Pythonic`zip`

The

`zip`

function is named after the zipper fastener. The fastener works by interlocking pairs of teeth taken from both zipper sides, a good visual analogy for what`zip(left, right)`

does: producing tuples of items to allow parallel iteration over two or more collections.Understanding

`zip`

is a small but important step to becoming a fluent Pythonista. Like`enumerate`

and`reversed`

,`zip`

is often used in`for`

loops to support use cases that would require error-prone index manipulations in a language like C. Python’s`for`

loop prevents many bugs by handling the index for you and producing the items you actually want. Most Python code I’ve seen that explicitly manipulated indexes in`for`

loops could be written more readably and safely using one or more of these functions. In Python 3,`zip`

,`enumerate`

, and`reversed`

return generators, so they are memory-efficient and fast.

### Vector Take #3: map-reduce and `abs`

Now that `Vector`

is iterable, we can easily compute its absolute value using the Euclidean norm. The formula is a generalization of the Pythagorean theorem:

We can code the sum inside the square root using the map-reduce functional programming pattern. In the *map* stage we compute the square of each vector component, and in the *reduce* stage we add them up.

Here is one way, using `map`

with `lambda`

to compute the squares, then `sum`

:

v = Vector([3, 4]) sum(map(lambda x: x*x, v))

That works. But the most idiomatic way in modern Python is using a generator expression instead of `map`

and `lambda`

:

sum(x*x for x in v)

Isn’t that elegant?

Computing the square root of that sum, we have our `__abs__`

method:

from array import array import math class Vector: def __init__(self, components): self._components = array('d', components) def __repr__(self): components_str = ', '.join(str(x) for x in self._components) return '{}([{}])'.format(self.__class__.__name__, components_str) def __iter__(self): return iter(self._components) def __abs__(self): # <- - - - - - - - - - - new method return math.sqrt(sum(x * x for x in self))

Now we can use the `abs()`

built-in to compute the absolute value of a vector, just like we use it with other numeric types:

abs(5), abs(-5.0), abs(3+4j), abs(Vector([3,4]))

### Vector Take #4: Scalar Multiplication

Scalar multiplication is the product of a vector and a real number. Given a `vector v`

, the expression `v*x`

builds a new `vector`

with each component multiplied by `x`

. Here’s how to do it using a generator expression:

v5 = Vector([1, 2, 3, 4, 5]) x = 11 Vector(x * y for y in v5)

To overload the `*`

operator we start by implementing a `__mul__`

method, like this:

from array import array import math class Vector: def __init__(self, components): self._components = array('d', components) def __repr__(self): components_str = ', '.join(str(x) for x in self._components) return '{}([{}])'.format(self.__class__.__name__, components_str) def __iter__(self): return iter(self._components) def __abs__(self): return math.sqrt(sum(x * x for x in self)) def __mul__(self, y): # <- - - - - - - - - - - new method return Vector(x * y for x in self)

We can now test it:

Vector([1, 2, 3, 4, 5]) * 11

All seems good, but there is a problem. Look what happens if we multiply a number by a vector:

try: res = 11 * Vector([1, 2, 3, 4, 5]) except TypeError as e: print(repr(e)) else: print('OK! result:', res)

The problem is that the `__mul__`

method in the `int`

class doesn’t know how to handle vectors. How could it? We just made up this `Vector`

class. The solution to this problem is a pattern named double dispatch, which Python implements at the interpreter level using special operator method names. See the picture below.

Double DispatchTo evaluate

`a*b`

, Python checks if`a.__mul__(b)`

exists. If it doesn’t, or if it returns the special value`NotImplemented`

, then Python calls`b.__rmul__(a)`

. Note that`NotImplemented`

is not an exception, but a special value that operator overloading methods return to say “I can’t handle that operator.” Many operator overloading special methods have a sibling named with that`r`

prefix, like`__rmul__`

,`__radd__`

, etc. The`r`

prefix stands forreversed.If the

forwardoperator method is absent or returns`NotImplemented`

, then Python invokes thereversedoperator, swapping the operands. Now if thereversedoperator method is absent or returns`NotImplemented`

, the interpreter raises a`TypeError`

exception stating that the operands are invalid.

Now we know what to do to fix that bug. Here’s the code of `Vector`

with the `__rmul__`

method:

from array import array import math class Vector: def __init__(self, components): self._components = array('d', components) def __repr__(self): components_str = ', '.join(str(x) for x in self._components) return '{}([{}])'.format(self.__class__.__name__, components_str) def __iter__(self): return iter(self._components) def __abs__(self): return math.sqrt(sum(x * x for x in self)) def __mul__(self, y): return Vector(x * y for x in self) def __rmul__(self, y): # <- - - - - - - - - - - new method return self * y

Let’s prove it works:

11 * Vector([1, 2, 3, 4, 5])

By design, scalar multiplication is a commutative operation between a vector and a scalar number.

However, as implemented, our `__mul__`

method raises confusing exceptions if the non-vector operand is unsuitable:

try: res = Vector([10, 11, 12]) * 'spam' except TypeError as e: print(repr(e)) else: print('OK! result:', res)

The problem with this exception is the message; it mentions a `'float'`

but the user code only shows a `Vector`

and a `str`

— the fact that `vector`

stores components as floats is an implementation detail. What we see here is the exception raised by multiplying `x * y`

where `x`

is one of the components of the array and `y`

is the string `'spam'`

. We need to handle this exception ourselves to provide better error reporting.

At first, it may seem reasonable to check the type of the `y`

argument in `__mul__`

and raise an exception if we can’t handle that type. However, there are several scalar numeric types in the standard library, from plain `int`

to `Fraction`

, and external libraries like `NumPy`

define even more numeric types.

When overloading operators we should not raise type exceptions most of the time. Instead, we return `NotImplemented`

to let the double dispatch algorithm continue, giving the other operand a chance to perform the multiplication in its `__rmul__`

method — if it can.

#### Duck Typing FTW

The Pythonic approach here is to use duck typing; we avoid type checking and rely on implemented behavior. In this case, the `y`

argument will multiply each floating point number in the `_components`

array, so any `y`

is good enough as long as it supports multiplication by a float. If that fails, we return `NotImplemented`

. Here is the code to do that:

from array import array import math class Vector: def __init__(self, components): self._components = array('d', components) def __repr__(self): components_str = ', '.join(str(x) for x in self._components) return '{}([{}])'.format(self.__class__.__name__, components_str) def __iter__(self): return iter(self._components) def __abs__(self): return math.sqrt(sum(x * x for x in self)) def __mul__(self, y): # <- - - - - - - - - - - improved method try: return Vector(x * y for x in self) except TypeError: return NotImplemented def __rmul__(self, y): return self * y

Now we have behavior that is compatible with the double dispatch mechanism, and Python can raise a `TypeError`

with a more sensible message:

try: res = Vector([10, 11, 12]) * 'spam' except TypeError as e: print(repr(e)) else: print('OK! result:', res)

Can you explain why the message mentions “sequence” first, even if the test expression was `Vector([10, 11, 12]) * 'spam'`

? Following the double dispatch diagram may be helpful.

We had to handle a few issues to overload `*`

, but now our implementation is smart enough to get a correct result from multiplying a `fraction`

and a `vector`

, thanks to the power of duck typing:

from fractions import Fraction Fraction(1, 3) * Vector([10, 20, 30])

### Vector Take #5: Vector Addition

Now let’s tackle vector addition: that is an operation on two vectors which returns a new vector made from the pairwise addition of the components of the operands, like this:

v1 = Vector([10, 11, 12, 13]) v2 = Vector([32, 31, 30, 29]) v3 = v1 + v2 print(v3) # --> Vector([42, 42, 42, 42])

That parallel addition is a perfect task for the `zip`

function, but there is a caveat, as soon as one of its iterable arguments is exhausted, `zip`

silently stops producing tuples:

list(zip([1, 2, 3, 4], [10, 20]))

In our case, that behavior would ignore extra components from the longer vector operand. A more useful behavior would be to produce a vector as long as the longest operand by padding the shortest one with zeroes.

Fortunately the `itertools`

module has the solution, the aptly named `zip_longest`

generator function:

import itertools list(itertools.zip_longest([1, 2, 3, 4], [10, 20], fillvalue=0))

The final listing shows the `__add__`

method using `zip_longest`

and basic error handling in line with the double dispatch pattern. I went ahead and implemented the `__radd__`

method as well:

from array import array import math import itertools class Vector: def __init__(self, components): self._components = array('d', components) def __repr__(self): components_str = ', '.join(str(x) for x in self._components) return '{}([{}])'.format(self.__class__.__name__, components_str) def __iter__(self): return iter(self._components) def __abs__(self): return math.sqrt(sum(x * x for x in self)) def __mul__(self, y): try: return Vector(x * y for x in self) except TypeError: return NotImplemented def __rmul__(self, y): return self * y def __add__(self, other): # <- - - - - - - - - - - new method try: pairs = itertools.zip_longest(self, other, fillvalue=0) return Vector(x + y for x, y in pairs) except TypeError: return NotImplemented def __radd__(self, other): # <- - - - - - - - - - - new method return self + other

The way I implemented the `+`

operator you can add a `vector`

with any iterable that produces numbers — the result will always be a `vector`

:

Vector([1, 2, 3, 4]) + [100, 200, 300]

Python does not allow adding tuples and lists. But the meaning of vector addition is very different; we are not concatenating a whole structure to another, but we are reading individual components from the operands to create a new vector.

I am not certain that the behavior I implemented is desirable. But it’s an interesting demonstration of the power of duck typing.

## Conclusion

These short examples demonstrate the power of the Python data model. We saw how:

- Basic language syntax like
`o[i]`

and`a + b`

handle user-defined types via special methods - Implementing just
`__len__`

and`__getitem__`

is enough to get a usable sequence type, and then leverage many standard library functions such as`sorted`

,`choice`

and`shuffle`

- Non-sequence types can provide an
`__iter__`

method to become iterable, enabling the use of powerful built-ins like`sum`

and`zip`

, as well as memory efficient generator expressions and the many specialized generator functions from the`itertools`

module - Infix operators like
`+`

and`*`

are supported by methods like`__add__`

and`__mul__`

, with reversed operator methods (e.g.`__radd__`

) to allow flexible handling of mixed types through double dispatch

These are only some examples of special methods usage. Other functionality, such as properties, context managers, and class metaprogramming also rely on special methods defined in the Python data model.