## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

# 4.6. Flattening a Nested Sequence

Credit: Luther Blissett, Holger Krekel, Hemanth Sethuram, ParzAspen Aspen

## Problem

Some of the items in a sequence may in turn be sub-sequences, and so on, to arbitrary depth of "nesting". You need to loop over a "flattened" sequence, "expanding" each sub-sequence into a single, flat sequence of scalar items. (A scalar, or atom, is anything that is not a sequence—i.e., a leaf, if you think of the nested sequence as a tree.)

## Solution

We need to be able to tell which of the elements we're handling are "subsequences" to be "expanded" and which are "scalars" to be yielded as is. For generality, we can take an argument that's a predicate to tell us what items we are to expand. (A predicate is a function that we can call on any element and that returns a truth value: in this case, `True` if the element is a subsequence we are to expand, `False` otherwise.) By default, we can arbitrarily say that every `list` or `tuple` is to be "expanded", and nothing else. Then, a recursive generator offers the simplest solution:

```def list_or_tuple(x):
return isinstance(x, (list, tuple))
def flatten(sequence, to_expand=list_or_tuple):
for item in sequence:
if to_expand(item):
for subitem in flatten(item, to_expand):
yield subitem
else:
yield item```

## Discussion

Flattening a nested sequence, or, equivalently, "walking" sequentially over all the leaves of a "tree", is a common task in many kinds of applications. You start with a nested structure, with items grouped into sequences and subsequences, and, for some purposes, you don't care about the structure at all. You just want to deal with the items, one after the other. For example,

```for x in flatten([1, 2, [3, [  ], 4, [5, 6], 7, [8,], ], 9]):
print x,```

emits ```1 2 3 4 5 6 7 8 9```.

The only problem with this common task is that, in the general case, determining what is to be "expanded", and what is to be yielded as a scalar, is not as obvious as it might seem. So, I ducked that decision, delegating it to a callable predicate argument that the caller can pass to `flatten`, unless the caller accepts `flatten`'s somewhat simplistic default behavior of expanding just tuples and lists.

In the same module as `flatten`, we should also supply another predicate that a caller might well want to use—a predicate that will expand just about any iterable except strings (plain and Unicode). Strings are iterable, but almost invariably applications want to treat them as scalars, not as subsequences.

To identify whether an object is iterable, we just need to try calling the built-in `iter` on that object: the call raises `TypeError` if the object is not iterable. To identify whether an object is string-like, we simply check whether the object is an instance of `basestring`, since `isinstance(obj, basestring)` is `True` when `obj` is an instance of any subclass of `basestring`—that is, any string-like type. So, the alternative predicate is not hard to code:

```def nonstring_iterable(obj):
try: iter(obj)
except TypeError: return False
else: return not isinstance(obj, basestring)```

Now the caller may choose to call `flatten(seq, nonstring_iterable)` when the need is to expand any iterable that is not a string. It is surely better not to make the `nonstring_iterable` predicate the default for `flatten`, though: in a simple case, such as the example snippet we showed previously, `flatten` can be up to three times slower when the predicate is `nonstring_iterable` rather than `list_or_tuple`.

We can also write a nonrecursive version of generator `flatten`. Such a version lets you flatten nested sequences with nesting levels higher than Python's recursion limit, which normally allows no more than a few thousand levels of recursion depth. The main technique for recursion removal is to keep an explicit last-in, first-out (LIFO) stack, which, in this case, we can implement with a list of iterators:

```def flatten(sequence, to_expand=list_or_tuple):
iterators = [ iter(sequence) ]
while iterators:
# loop on the currently most-nested (last) iterator
for item in iterators[-1]:
if to_expand(item):
# subsequence found, go loop on iterator on subsequence
iterators.append(iter(item))
break
else:
yield item
else:
# most-nested iterator exhausted, go back, loop on its parent
iterators.pop( )```

The `if` clause of the `if` statement executes for any item we are to expand—that is, any subsequence on which we must loop; so in that clause, we push an iterator for the subsequence to the end of the stack, then execute a `break` to terminate the `for`, and go back to the outer `while`, which will in turn execute a new `for` statement on the iterator we just appended to the stack. The `else` clause of the `if` statement executes for any item we don't expand, and it just `yield`s the item.

The `else` clause of the `for` statement executes if no `break` statement interrupts the `for` loop—in other words, when the `for` loop runs to completion, exhausting the currently most-nested iterator. So, in that `else` clause, we remove the now-exhausted most-nested (last) iterator, and the outer `while` loop proceeds, either terminating if no iterators are left on the stack, or executing a new `for` statement that continues the loop on the iterator that's back at the top of the stack—from wherever that iterator had last left off, intrinsically, because an iterator's job is exactly to remember iteration state.

The results of this nonrecursive implementation of `flatten` are identical to those of the simpler recursive version given in this recipe's Solution. If you think non-recursive implementations are faster than recursive ones, though, you may be disappointed: according to my measurements, the nonrecursive version is about 10% slower than the recursive one, across a range of cases.

Library Reference and Python in a Nutshell sections on sequence types and built-ins `iter`, `isinstance`, and `basestring`.