Chapter 4. Structure

It’s a trap!

Admiral Ackbar

Let’s move on into questions of structure and how you can hurt your future self with tangled logic and deep coupling. These structural problems impact your ability to change or reuse your code as its requirements inevitably change.

Pathological If/Elif Blocks

This anti-pattern arises when you get into the business of creating a “one-stop shop” function that has to contend with many special cases.

The first if/else block arrives innocently, and even the first elif doesn’t seem so bad. But soon their friends arrive:

def do_awesome_stuff():
    ...
    if thing.has_condition_one():
        ...
    elif thing.has_condition_two():
        ...
    elif thing.get_conditions() in ['conditon3', 'condition4']:
        ...
    elif thing.has_condition_forty_two():
        ...
    else:
        ...
    ...

Suddenly you find yourself with hundreds of lines of elifs. And good luck if any of the contents of those blocks is at all complicated—anyone reading this code will be fortunate if they even remember they’re in this elif nightmare after 30 or 40 lines. And how excited will you be to write tests for this function?

This has a kind of momentum as well—special cases tend to attract more special cases, as if drawn together gravitationally. Just adding more elifs feels easier than cleaning up. Except cleaning up isn’t so bad. If we really do need to manage many special cases, we can employ the Strategy pattern:

def strategy1():
    ...

def strategy2():
    ...

strategies = {
    'condition1': strategy1,
    'condition2': strategy2,
    ...
}

def do_awesome_stuff():
    which_one = ...
    strategy = strategies[which_one]
    strategy()
    ...

We start by extracting the contents of our if/elif/else structure into separate functions with identical interfaces. Then we can create a dictionary to map conditions to those strategy functions. The dictionary key doesn’t have to be a string. It can be anything hashable, so tuples and frozensets can be quite effective if we need richer conditions. Finally, our original function determines which key to use, plucks the appropriate strategy function from our dictionary, and invokes it.

Our original function is now much, much simpler to understand, as are each of the strategies, and writing tests for each of the now-isolated strategies is straightforward.

However, figuring out what value to use for that dictionary key can sometimes be complicated. If it takes 200 lines to determine what key to use, is this really much of a victory?

If that’s the case, consider externalizing it entirely, and let the strategy be chosen by the caller, who may in fact know better than we do about whatever those factors are. The strategy is invoked as a callback:

def do_awesome_stuff(strategy):
    ...
    strategy()
    ...


result = do_awesome_stuff(strategy1)

From there it’s not too far of a jump into dependency injection, where our code is provided with what it needs, rather than having to be smart enough to ask for it on its own:

class Foo(object):

    def __init__(self, strategy):
        self.strategy = strategy

    def do_awesome_stuff(self):
        ...
        self.strategy()
        ...


foo = Foo(strategy2)
foo.do_awesome_stuff()

Unnecessary Getters and Setters

In between Perl and Python, there was a brief window where I was immersed in Java, but its influence lingered far beyond those few months. When I got to do some of my first brand new, greenfield development of an invitation service, I made sure that all of the model objects were replete with getters and setters because, darn it, this was how object-oriented programming was supposed to be! I would show them all—attribute access must be protected!

And thus it was that I produced many classes that looked like this:

class InviteEvent(object):
    ...

    def getEventNumber(self):
        return self._intEventNumber

    def setEventNumber(self, x):
        self._intEventNumber = int(x)

    ...

Each and every attribute of each and every class had getter and setter functions that did barely anything. The getters would simply return the attributes that they guarded, and the setters would occasionally enforce things like types or constraints on the values the attributes were allowed to take. This InviteEvent class had 40 getters and 40 setters; other classes had even more. That’s a lot of code to accomplish very little—and that’s not even counting the tests needed to cover it all.

And trying to work with instances of these objects was pretty awful, too—this kind of thing quickly becomes tiresome:

event.setEventNumber(10)
print event.getEventNumber()

Fortunately, there’s a practical, Pythonic solution to this labyrinth of boilerplate: just make most attributes public, and use properties to protect any special snowflakes that need extra care and feeding.

Properties let you provide functions that masquerade as attributes of the object: when you read the attribute, a getter is invoked; when you assign to the attribute, a setter is called; when you try to delete the attribute, it’s managed by a deleter. The setter and deleter are both optional—you can make a read-only attribute by declaring only the getter. And the really great thing is that you don’t need to know in advance which attributes will need to be properties. You have the freedom to sketch out exactly what you want to work with, then transparently replace attributes with properties without having to change any calling code because the interface is preserved.

In modern Python, properties are constructed with the @property decorator, which is just syntactic sugar for a function that replaces a method with a property object of the same name and wires it up to the getter. The property object also has setter and deleter functions that can be used as decorators to attach setter and deleter functionality to the property.

That might sound complicated, but it’s actually rather clean:

class InviteEvent(object):
    ...

    @property
    def event_number(self):
        return self._event_number

    @event_number.setter
    def _set_event_number(self, x):
        self._event_number = int(x)

    @event_number.deleter
    def _delete_event_number(self):
        self._event_number = None

    ...

The only trick is remembering to use the name of the property when hooking up the setter or deleter, rather than using @property itself.

One nice thing about this decorator-based approach is that it doesn’t junk up the namespace of the class with a bunch of functions that you really don’t want anyone to call. There’s just the single property object for each property!

Using these objects is far more comfortable than before, too. All those function calls and parentheses simply vanish, leaving us with what looks like plain old “dot” access:

event.event_number = 10
print event.event_number

Getting Wrapped Up in Decorators

One of the things I was most excited about as Python evolved was the opportunity to use decorators to attach reusable functionality to functions and methods. We saw its benefits above with @property.

A decorator is a function (or, more generally, a callable) that returns a function, which replaces the function being decorated. Imagine a small nesting doll (the function being decorated), placed inside another nesting doll (the “wrapper” function returned by the decorator). We use the syntactic sugar of the @ symbol to apply decorators to functions being decorated.

Here’s a simple decorator that wraps a function in another function that does something special before allowing the first function to be executed:

def my_decorator(function):
    def wrapper(*args, **kwargs):
        # do something special first
        ...
        return function(*args, **kwargs)
    return wrapper

@my_decorator
def foo(x, y, z):
    ...

Typical uses for decorators involve altering or validating the input to a function, altering the output of a function, logging the usage or timing of a function, and—especially in web application frameworks—controlling access to a function. You can apply as many decorators as you want, too—it’s nesting dolls all the way down!

Decorators sound pretty swell, so why are we talking about them in a book about mistakes?

When you use Python’s decorator syntax to wrap and replace functions, you immediately couple the original function to all the behavior that comes with the wrapper. If the original function is about making some calculation and the wrapper is about logging, the result is a function that’s inescapably, inextricably about both of those concerns. This coupling is compounded with each additional decorator that’s applied.

Did you want to test the original function in isolation? Too bad—that function is effectively gone. Your test has no choice but to exercise the final, multilayered Frankenstein function, which means you may have a series of unpleasant hoops to jump through in order to set up the test, none of which is material to the problem the original function is attempting to solve. The same goes for trying to call that original function in your production code—once the decorators have been applied, you’re stuck with all the extra baggage that comes with them.

As a web developer, I encounter this the most when writing unit tests for controller methods (“views” in the Django parlance), because I often have several layers applied. A typical example might look something like this:

class MyController(object):

    @require_https
    @require_signed_in
    @validate_form(SomeForm(), ...)
    @need_database_connection
    def handle_post(self, request):
        ...
        return HTTPResponse(...)

It can be hugely beneficial to have those access controls written in a way that they can quickly be reused throughout the application, but it means that if I’m going to write tests, I have to do all the work required to fake out the request context so that the request will actually make it to the code that I want to test. In an ideal world, the innermost method I’m testing is simple and doesn’t need more than one or two tests to cover its behavior, but if it’s at all complicated, the amount of setup necessary can become quite tedious (unless of course you get excited about refactoring unit tests, in which case have at it!).

And all of that setup means that I’m not only testing the original function, but in effect I’m testing all of the wrappers that the function has been decorated with, each of which should already have tests of their own.

The approach I’ve gravitated toward is to make the decorated method as simple and devoid of logic as possible, pushing all of its smarts down into a deeper layer of abstraction that can be tested in isolation:

class MyController(object):

    @require_https
    @require_signed_in
    @validate_form(SomeForm(), ...)
    @need_database_connection
    def handle_post(self, request):
        # get data from request
        data = { ... }
        self.object_service.create_object(data)
        return HTTPResponse(...)

Then the responsibility of the controller method is limited to receiving the request and handing the right data off to someone else, which makes its tests simpler as well. It also means that the core business logic is relocated away from the web interface and into a position that allows it to be reused.

Breaking the Law of Demeter

The Law of Demeter (also known as the principle of least knowledge) tells us that our code should only interact with the things that it knows about, and not reach deeply into nested attributes, across friends of friends, and into strangers.

It feels great to break this law because it’s so expedient to do so. It’s easy to feel like a superhero or a ninja commando when you quickly tunnel through three, four, or more layers of abstraction to accomplish your mission in record time.

Here are just a few examples of my countless crimes. I’ve reached across multiple objects to call a method:

gvars.objSession.objCustomer.objMemberStatus.isPAID()

Or reached through dictionaries to call a method to get an object to use to call another method:

if gvars.dctEnv['session'].getCustomer().isSignedIn():
	...

Or called single-underscore-prefixed internal methods of an object: (more on this in a moment):

current_url = self.objSession._getCurrentURL()

Or called a method on an item plucked from a list returned by a method call on a single-underscore internal attribute of an object:

return event._objGuestList.getGuestList()[0].getEventSequence()

Yikes!

This kind of thing might be okay when we’re debugging, or exploring in an interactive shell, but it’s bad news in production code. When we break this law, our code becomes brittle. Instead of relying on the public interface of a single object, it now relies on a delicate chain of nested attributes, and any change that disrupts that chain will break our code in ways that will furrow our brows as we struggle to repair the complex code plumbing mess we’ve made for ourselves.

We should especially avoid depending on single- and double-underscore internals of an object, because they are prefixed this way for a reason. We are explicitly being told that these items are part of the internal implementation of the object and we cannot depend on them to remain as they are—they can be changed or removed at any time. (The single underscore is a common convention to indicate that whatever it prefixes is “private-ish,” while double-underscore attributes are made “private” by Python’s name mangling.)

The problem of these violations is even worse than it seems, for it turns out that the brittleness and calcification of the system happens in both directions. Not only is the calling code locked into the internal interfaces that it’s traversing, but each and every object along that path becomes locked in place as well, as if encased in amber. None of these objects can be freely or easily changed, because they are all now tightly coupled to one another.

If it really is the responsibility of an object to surface something from deep within its internals, make that a part of the object’s public interface, a first-class citizen for calling code to interact with. Or perhaps an intermediary helper object can encapsulate the traversal of all those layers of abstraction, so that any brittleness is isolated to a single location that’s easy to change instead of woven throughout the system. Either way, let abstraction work for you. This frees both the caller and callee to change their implementations without disrupting each other, or worse, the entire system.

Overusing Private Attributes

When I started with Python, I was still fresh out of school, where I’d heard over and over again about the importance of object-oriented programming ideals like “information hiding” and private variables. So when I came to Python, I went a little overboard with private methods and attributes, placing leading double underscores on practically everything I could get my hands on:

class MyClass(object):

    def __init__(self, arg1, arg2, ...):
        self.__attr1 = arg1
        self.__attr2 = arg2
        ...

    def do_something(self):
        self.__do_a_step()
        self.__do_another_step()
        self.__do_one_more_step()
        self.__do_something_barely_related()

    # and so forth...

“Hands off!” this code shouts. “You’ll never need to use these things, and I know better than you!”

Inevitably, I discovered that I did need to use code that was hiding behind the double underscore, sometimes to reuse functionality in previously unforeseen ways, sometimes to write tests (either to test a method in isolation or to mock it out).

Let’s say we wanted to subclass that MyClass up above, and it needs a slightly customized implementation of the do_something method. We might try this:

class MyOtherClass(object):

    def do_something(self):
        self.__do_a_new_step()
        self.__do_one_more_step()

This will fail with an AttributeError, because the name mangling that Python applies to make the attribute private means that our subclass won’t actually have a __do_one_more_step method to call. Instead, we would have to invoke self._⁠MyClass__⁠do_⁠one_​more_step, and that’s just nasty.

All that privacy just got in the way. What at first seemed like a cool language feature turned out to be a giant nuisance that I was always working around.

In time, I came to prefer using just a single underscore to politely indicate that an attribute is part of the internal implementation of a class and shouldn’t be referenced externally without some hesitation. Since single-underscore names aren’t mangled, they can be more conveniently used if you absolutely must break the Law of Demeter.

Further experience taught me that I shouldn’t even want to do that. When “outside” code wants access to the internals of a class, those “internal” attributes probably shouldn’t be private at all; rather, this is a clear signal that those attributes should be public. The code is telling us that it’s time to refactor!

The “private” attribute we keep using externally should either be promoted to not have any leading underscores, or should be exposed via a property if some amount of control is still required. If we feel the need to replace or monkey-patch an internal method of a class, we should instead be thinking about extracting that into a strategy that perhaps we just pass in to the public method we’re calling. If we find that we need to call into that “internal” functionality in multiple places, then what the code is telling us is that that functionality doesn’t really belong in this class at all. It should be extracted into a separate function, or if complex enough, into a separate object that the class might collaborate with rather than wholly contain.

God Objects and God Methods

A well-behaved class or method should have a strictly limited set of responsibilities, preferably as close to one as possible (in accordance with the Single Responsibility Principle), and should only contain whatever knowledge or data it needs to fulfill its limited role. All classes and methods start this way, simple and innocent, but we may find it convenient or expedient to grow these entities as our requirements evolve. When a class or method has accumulated too much knowledge or too many responsibilities, its role in the system becomes practically godlike: it has become all-encompassing, all-seeing, and all-doing, and many other entities will end up being tightly coupled to it in order to get anything done. Like big banks in the autumn of 2008, our god objects and god methods are too big to maintain, yet too big to fail.

These are pretty easy to spot: we’re looking for large modules, large classes, classes with many methods, and long methods or functions. There are a couple of different ways to go about this.

Pylint will by default complain about modules longer than 1000 lines and functions longer than 50 lines (and you can adjust these values as needed), but you have to look carefully at its voluminous output to make sure you don’t miss these warnings. WingIDE and Komodo integrate with Pylint for code inspection, so they’ll also help you find these problems. Curiously, while PyCharm offers code inspection that covers many of the same issues that Pylint does, it doesn’t include warnings about module or function length.

If you aren’t someone who enjoys working with an IDE, you can use some Unix command-line kung fu to identify potential sources of godlike trouble:

$ find . -name "*.py" -exec wc -l {} \; | sort -r 1
$ grep "^class " bigmodule.py | wc -l 2
$ grep "\sdef " bigmodule.py | wc -l 3
1

Find all Python soure files, count the number of lines, and sort the results in descending order, so that the files with the most lines bubble to the top of the list; anything over 1000 lines is worth further investigation.

2

Count the number of classes defined in a big module…

3

And the number of methods defined at some level of indentation (i.e., within a class or within other functions) in that module.

If the ratio of methods to classes seems large, that’s a good warning sign that we need to take a closer look.

Or, if we feel like being creative, we can use Python to make a little cross-platform tool:

import collections
import fileinput
import os


def find_files(path='.', ext='.py'):
    for root, dirs, filenames in os.walk(path):
        for filename in filenames:
            if filename.endswith(ext):
                yield(os.path.join(root, filename))


def is_line(line):
    return True


def has_class(line):
    return line.startswith('class')


def has_function(line):
    return 'def ' in line


COUNTERS = dict(lines=is_line, classes=has_class,
        functions=has_function)


def find_gods():
    stats = collections.defaultdict(collections.Counter)
    for line in fileinput.input(find_files()):
        for key, func in COUNTERS.items():
            if func(line):
                stats[key][fileinput.filename()] += 1

    for filename, lines in stats['lines'].most_common():
        classes = stats['classes'][filename]
        functions = stats['functions'][filename]
        try:
            ratio = "=> {0}:1".format(functions / classes)
        except ZeroDivisionError:
            ratio = "=> n/a"
        print filename, lines, functions, classes, ratio


if __name__ == '__main__':
    find_gods()

This small program is enough to recursively find all .py files; count the number of lines, classes, and functions in each file; and emit those statistics grouped by filename and sorted by the number of lines in the file, along with a ratio of functions to classes. It’s not perfect, but it’s certainly useful for identifying risky modules!

Let’s take a high-level look at some of the gods I’ve regretted creating over the years. I can’t share the full source code, but their summaries should illustrate the problem.

One of them is called CardOrderPage, which spreads 2900 lines of pain and suffering across 69 methods, with an 85-line __init__ and numerous methods in excess of 200 to 300 lines, all just to shovel some data around.

MemberOrderPage is only 2400 lines long, but it still packs a whopping 58 methods, and its __init__ is 90 lines. Like CardOrderPage, it has a diverse set of methods, doing everything from request handling to placing an order and sending an email message (the last of which takes 120 lines, or roughly 5 percent of the class).

Then there’s a thing called Session, which isn’t really what most web frameworks would call a session (it doesn’t manage session data on the server), but which instead provides context about the request, which is a polite way to say that it’s a big bag of things that you can hurt yourself with. Lots of code in this codebase ended up being tightly coupled to Session, which presents its own set of problems that we’ll explore further in a later section.

At the time that I captured the data about it, Session was only about 1000 lines, but it had 79 methods, most of which are small, save for a monstrous 180-line __init__ laden with mine fields and side effects.

Besides line count, another way you can identify god methods is by looking for naming anti-patterns. Some of my most typical bad methods have been:

def update_everything(...):
    ...

def do_everything(...):
    ...

def go(...):
    ...

If you find these kinds of abominations in your code, it’s a sign that it’s time to take a deep breath and refactor them. Favor small functions and small classes that have as few responsibilities as possible, and strive to do as little work as possible in the __init__ so that your classes are easy to instantiate, with no weird side effects, and your tests can be easy and lighweight. You want to break up these wanna-be gods before they get out of hand.

Increasing the number of small classes and methods may not optimize for raw execution speed, but it does optimize for maintenance over the long term and the overall sanity and well-being of the development team.

Global State

We come now to one of my greatest regrets. This module is called gvars.py, and it started simply as a favor to another developer who needed easy access to some objects and didn’t want to pass them around everywhere, from way at the top of the stack to deep down in the guts:

dctEnv = None
objSession = None
objWebvars = None
objHeaders = None
objUserAgent = None

It’s basically just a module that has some module-level global variables that would get repopulated by the app server with every request that would come in over the web. If you import it, you can talk to those globals, and you can do this at any level, from those lofty heights that first see the request, where it seems like a reasonable thing to want to do, all the way down to the darkest, most horrible depths of your business logic, data model, and scary places where this has no business being. It enables this sort of thing at every level of your system:

from col.web import gvars
...

if gvars.objSession.hasSomething():
    ...

if gvars.objWebvars.get('foo') == 'bar':
    ...

strUserName = \
    gvars.objSession.objCustomer.getName()

This is tremendously convenient when you’re writing website code—you can get at anything important about the request at any point, no matter where you are. Poof! Magic!

But as soon as you need to do anything else, it all falls apart. Any kind of script, cron job, or backend system is doomed, because if it needs to use anything that has been tainted by gvars, well, too bad! Code that uses gvars is immediately tightly coupled to the context of a web request, where normally the app server would set up all of those heavyweight objects based on the request. But outside of the context of a request, we don’t have an app server, we don’t get all those objects made for free, and even if we did, they wouldn’t make sense—what is a user agent or POST variable in the context of a cron job?

The only hope for using code that’s bound to gvars outside of its native milieu is to do extensive faking, populating gvars manually with objects that are good enough to get by, and providing your own specially crafted return values when necessary. This is less fun than it sounds.

Let’s consider an example of the madness that gvars begat. The PermissionAdapter is a class that would fetch question data for managing user opt-ins and opt-outs for various flavors of email. Naturally, for convenience, it depended on a pile of objects created by the app server and injected into gvars at request time, and it was so internally knotty that refactoring it to clean it up was frowned upon, lest we end up doing more harm than good. No, it didn’t have unit tests, why do you ask?

from col.web import gvars

class PermissionAdapter(object):

    def __init__(self, ...):
        # uh-oh, this can’t be good...
        dctEnv = gvars.dctEnv
        self._objWebvars = dctEnv["webvars"]
        self._objSession = dctEnv["session"]
        self._objHeaders = dctEnv["headers"]

    def getQuestions(...):
        site = self._objSession.getSiteGroup()
        data = self._someThingThatWants(site)
        ...

For its specific purpose, it got the job done, albeit with little flexibility.

One day, I had to figure out how to surface the permission questions for more than one sitegroup (a fancy internal term for a customer namespace). Without some serious refactoring, this just wasn’t possible as-is.

So instead—and I am so, so sorry for this—I wrote a PermissionFacade wrapper around the PermissionAdapter, and its job was to fake out the necessary objects in gvars using Mock objects, instantiate a PermissionAdapter, then restore the original gvars before leaving the method:

class PermissionFacade(object):

    def __init__(self, ...):
        self.webvars = Mock()
        self.session = Mock()
        self.headers = Mock()

    def get_questions(sitegroup, ..., whatever):
        adapter = self.make_adapter(...)
        return adapter.getQuestions(...)

    def _make_adapter(self, sitegroup, ...):
        from col.web import gvars
        orig_gvars_env = gvars.dctEnv
        gvars.dctEnv = {
            'webvars': self.webvars,
            'session': self.session,
            'headers': self.headers,
        }
        self.session.getSource.return_value = source
        self.session.getSourceFamily.return_value = \
                source_family
        try:
            self.permission_adapter = PermissionAdapter(
                    sitegroup, ...)
            # ...and some other grotesque mock monkey
            # patching to fake out a request context...
        finally:
            gvars.dctEnv = orig_gvars_env
        return self.permission_adapter

Thank goodness we at least have finally to give us the opportunity to put the original values back into place. This makes sure that no matter what happens during the corresponding try, we’ll put everything back where it was so that we can service other callers. Subsequent calls can ask for questions for a diffent sitegroup, and we can then combine the results further upstream.

But patching things into place like this in production code is a bad idea, because it’s just too easy to screw up in a way that’s weird or subtle. Even in the normal web server context, something like gvars can lead to safety issues and possible data leakage between requests unless it’s carefully managed, as those module-level globals will persist as long as the process is running. We’ll see this come back to haunt us in the next chapter.

Avoid global state as much as humanly possible. Resist its siren lure, reject its convenience, refuse the temptation, no matter how much your colleagues think they want it. In the long run, they’ll thank you for not having to maintain such monstrosities.

Get How to Make Mistakes in Python now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.