Credit: Tim Peters, PythonLabs
Computer manufacturers of the 1960s estimated that more than 25 percent of the running time on their computers was spent on sorting, when all their customers were taken into account. In fact, there were many installations in which the task of sorting was responsible for more than half of the computing time. From these statistics we may conclude that either (i) there are many important applications of sorting, or (ii) many people sort when they shouldn’t, or (iii) inefficient sorting algorithms have been in common use.
Professor Knuth’s masterful work on the topics of sorting and searching spans nearly 800 pages of sophisticated technical text. In Python practice, we reduce it to two imperatives (we read Knuth so you don’t have to):
When you need to sort, find a way to use the built-in
sortmethod of Python lists.
When you need to search, find a way to use built-in dictionaries.
Many recipes in this chapter illustrate these principles. The most
common theme is using the
decorate-sort-undecorate (DSU) pattern, a
general approach to transforming a sorting problem by creating an
auxiliary that we can then sort with the default, speedy
sort method. This is the single most useful
technique to take from this chapter. It relies on an unusual feature
of Python’s built-in comparisons: sequences are
compared lexicographically. Lexicographical order is a generalization
to tuples and lists of the everyday rules used to compare strings
(i.e., alphabetical order). The built-in
cmp(s1, s2), when
s2 are sequences, is equivalent to this Python
def lexcmp(s1, s2): # Find leftmost nonequal pair i = 0 while i < len(s1) and i < len(s2): outcome = cmp(s1[i], s2[i]) if outcome: return outcome i += 1 # All equal, until at least one sequence was exhausted return cmp(len(s1), len(s2))
This code looks for the first nonequal corresponding elements. If such a nonequal pair is found, that pair determines the outcome. Otherwise, if one sequence is a proper prefix of the other, the prefix is considered to be the smaller sequence. Finally, if these cases don’t apply, the sequences are identical and are considered equal. Here are some examples:
>>> cmp((1, 2, 3), (1, 2, 3)) # identical
0>>> cmp((1, 2, 3), (1, 2)) # first larger because second is a prefix 1 >>> cmp((1, 100), (2, 1)) # first smaller because 1<2 -1 >>> cmp((1, 2), (1, 3)) # first smaller because 1==1, then 2<3 -1
An immediate consequence of lexicographical comparisons is that if you want to sort a list of objects by a primary key, breaking ties by comparing a secondary key, you can simply build a list of tuples, in which each tuple contains the primary key, secondary key, and original object, in that order. Because tuples are compared lexicographically, this automatically does the right thing. When comparing tuples, the primary keys are compared first, and if (and only if) the primary keys are equal, the secondary keys are compared.
The examples of the DSU pattern in this chapter show many applications of this idea; perhaps the cutest is Recipe 2.4, which ensures a stable sort by using each object’s list index as a secondary key. Of course, the DSU technique applies to any number of keys. You can add as many keys to the tuples as you like, in the order in which you want the keys compared.
A definitive generalization is provided by Recipe 2.8, which provides a general routine to sort lists of objects by user-specified index position or attribute name. This is fine code, and you are free to use it for your own purposes. It suffers from a problem common to many frameworks in Python, though: once you get the hang of it, using the DSU pattern is so easy that you’ll find it’s quicker to do it from scratch than to remember how to use the framework. I haven’t yet decided whether this is a strength or weakness of Python, but it’s definitely a real phenomenon.
- What algorithm does Python’s list.sort use?
In early releases of Python,
qsortroutine from the underlying platform’s C library. This didn’t work out for several reasons, but primarily because the quality of
qsortvaried widely across machines. Some versions were extremely slow when given a list with many equal values or in reverse-sorted order. Some even dumped core because they weren’t reentrant. A user-defined
_ _cmp_ _function can also invoke
list.sort, so that one
list.sortcan invoke others as a side effect of comparing. Some platform
qsortroutines couldn’t handle that. A user-defined
_ _cmp_ _function can also (if it’s insane or malicious) mutate the list while it’s being sorted, and many platform
qsortroutines dumped core when that happened.
Python then grew its own implementation of the quicksort algorithm. This was rewritten with every release, as real-life cases of unacceptable slowness were discovered. Quicksort is a delicate algorithm!
In Python 1.5.2 the quicksort algorithm was replaced by a hybrid of samplesort and binary insertion sort, and that hasn’t changed again to date. Samplesort can be viewed as a variant of quicksort that uses a very large sample size to pick the partitioning element, also known as the pivot (it recursively samplesorts a large random subset of the elements and picks the median of those). This variant makes quadratic-time behavior almost impossible and brings the number of comparisons in the average case much closer to the theoretical minimum.
However, because samplesort is a complicated algorithm, it has too much administrative overhead for small lists. Therefore, small lists (and small slices resulting from samplesort partitioning) are handled by a separate binary insertion sort. This is an ordinary insertion sort, except that it uses binary search to determine where each new element belongs. Most sorting texts say this isn’t worth the bother, but that’s because most texts assume that comparing two elements is as cheap as or cheaper than swapping them in memory. This isn’t true for Python’s sort! Moving an object is very cheap, since what is copied is just a reference to the object. Comparing two objects is expensive, though, because all of the object-oriented machinery for finding the appropriate code to compare two objects and for coercion gets reinvoked each time. This makes binary search a major win for Python’s sort.
On top of this hybrid approach, a few common special cases are exploited for speed. First, already-sorted or reverse-sorted lists are detected and handled in linear time. For some applications, these kinds of lists are very common. Second, if an array is mostly sorted, with just a few out-of-place elements at the end, the binary insertion sort handles the whole job. This is much faster than letting samplesort have at it and is common in applications that repeatedly sort a list, append a few new elements, then sort it again. Finally, special code in the samplesort looks for stretches of equal elements, so that the slice they occupy can be marked as done early.
In the end, all of this yields an in-place sort with excellent performance in all known real cases and supernaturally good performance in common special cases. It spans about 500 lines of complicated C code, which gives special poignancy to Recipe 2.12.
- Is Python’s sort stable?
No, samplesort is not stable, and it cannot be made stable efficiently. See Recipe 2.4 if you need stability.
- But I’ve tried many examples, and they’re all stable!
You tried small examples. The binary insertion sort is stable, and, as I explained earlier, samplesort doesn’t kick in until the list gets larger.
- How large?
It’s an implementation detail you can’t rely on across releases. Today, the answer is 100 items.
- But Recipe 2.12 shows a stable sort. Why not use that?
It’s a cute example, but it does twice as many comparisons as a real quicksort. As I explained earlier, the cost of comparisons dominates sorting time, so it would take at least twice as long as Python’s sort even if it was coded in C. Even if it didn’t take twice as long, samplesort is quicker than quicksort.
- Mergesort does few comparisons and can be made stable easily. Why doesn’t Python use that?
Mergesort requires extra memory, and it does much more data movement than Python’s current sort. Despite the fact that comparison time dominates, samplesort does few enough comparisons that the extra data movement done by mergesort is significant. I implemented three mergesorts while investigating quicksort alternatives for Python 1.5.2, and they were all significantly slower than the approach Python uses.
- Why is passing a comparison function so much slower than using the DSU pattern?
In search performance, comparison time dominates, and an explicit comparison function written in Python adds the substantial overhead of a Python-level function call to each comparison. The built-in comparisons are all coded in C, so they don’t incur the overhead of Python-level function calls.
- So I should never pass an explicit comparison function, right?
Speed isn’t always everything. If you can afford the speed hit, and you find it more convenient and clearer in a given case to pass a comparison function, go ahead. I do.
- Why does Python use the three-outcome
cmpfor sorting? Why doesn’t it use a simple less-than comparison instead?
This is a historical consequence of Python initially using the C
qsortrequires a three-outcome comparison function. Like many people, I wish it used a simple less-than comparison, but it’s too late now. When rich comparisons were introduced, Python’s
list.sortwas reworked a little, so that although it uses a three-outcome comparison function at the user level, internally it also works fine with objects that implement only
_ _lt_ _comparison.
- What’s the best way to sort a list in reverse order?
list.sort(lambda a, b: cmp(b, a))
Here’s another technique that is faster and more obvious but that is often avoided by those who mistakenly believe that writing two lines of code where one might do is somehow sinful:
list.sort( ) list.reverse( )
- What kind of hash table does Python’s dictionary type use?
The dictionary type uses contiguous tables with power-of-two sizes. Collisions are handled within the table, and the table is kept at most two-thirds full, so that collision handling is very unlikely to degenerate. When a table gets too full, the table size is doubled and all the elements are reinserted. For more detail, read the source code in
dictobject.h. As of Python 2.2, the source code is extensively commented.
- I’ve heard that Python’s dictionary implementation is very fast. Is that true?
Yes. Because Python uses dictionaries to implement module and object namespaces, dictionary performance is crucial. For the past decade, smart people in the Python community have helped optimize it—the result is open source at its best.