Picking a Python version: A manifesto
This report guides you through the implicit decision tree of choosing what Python version, implementation, and distribution is best suited for you.
There are two major versions of the Python programming language: the Python 2.x series, and the newer Python 3.x series. The Python 3.x series started with the release of Python 3.0 in December 2008; since that time, Python 2.7 was released and has continued to receive point version releases, with Python 2.7.9 being the latest version as of this writing. As a footnote, there was, naturally, a Python 1.x series that was highly successful in the 1990s, but that series has long been out-of-maintenance.
In his 2014 PyCon keynote (Figure 1-1), Guido van Rossum, Python’s Benevolent Dictator for Life (sometimes affectionately called the BDFL or GvR) made it clear that the Python 2.x series will not continue past Python 2.7.x. The end-of-life for Python 2.7.x is 2020, and no new features will be added to the 2.x series of Python.
Within a series, Python places an especially strong philosophical emphasis on backward compatibility (in comparison to many other programming languages). It is extremely rare for a series to break this backward compatibility in later minor versions; in the few cases where it has occurred, it is only to address large previously undiscovered bugs or security issues, and even then great attention is paid to affecting as little running code as possible. That said, new Python versions inevitably add important features, either in the language itself, its built-in functions, or in standard library support modules.
Python developers have numerous versions and implementations to choose from. Many of us still use Python 2.x and CPython—and that is still often a good choice—but there are many situations in which making a different choice makes sense.
Python 3.x, and in particular the Python 3.5 version coming out soon, offer numerous advantages over the Python 2.x series. New development should typically be done using Python 3.x, and although the 2020 end-of-life for Python 2.x isn’t all that close, it is good for companies and projects to start think of porting plans for existing code.
The (Small) Break
With the introduction of the so-called “Python 3000” version, which was released after long deliberation as Python 3.0 and subsequent Python 3.x versions, the designers deliberately allowed small language incompatibilities to be introduced in the new language series. A full discussion of this is contained in a Python Enhancement Proposal: PEP 3000 – Python 3000. Even during the Python 2.x to 3.x series transition, the philosophy of Python insisted on making changes as incremental as possible. Unlike in the histories of some other programming language evolutions, Python 3.x was never meant as a blue-sky project, and avoids the pitfalls of a “second-system effect.”1 Python 3.x aims to improve a familiar Python 2.x language rather than to create a new or fundamentally different language.
While there is a significant subset of Python code that is able to be version-neutral “out of the box,” most existing codebases require a small degree of rewriting and/or use of compatibility tools and libraries to move from Python 2.x to Python 3.x. Real-world code using the most natural idioms in Python 2.x usually does various things that are not quite compatible with Python 3.x, and the code that is version-neutral must be written in a careful way to achieve that. That is to say, if you wrote some module or script 10 years ago, it is quite unlikely is will “just run” in Python 3.x with no changes.
The bottom line is that Python 3.x is a better language than Python 2.x, albeit in a large collection of relatively small ways. But as with any transition—especially one that introduces genuine backward incompatibilities—it takes time to move users to the latest version. To a lesser extent, this is true even within a series: there were (are) certainly some few long-running and stable programs that ran (probably still run) on Python 2.1—or even on Python 1.5.2—where simply keeping the platform consistent was easier than testing a transition. “If it ain’t broke, don’t fix it!”
However, going forward, new projects should be written for Python 3.x, and actively maintained old projects should transition between versions or plan on doing so at the earliest opportunity. Still, even after the 2020 end-of-life, executables already running will not suddenly stop doing so (albeit, changes in underlying operatings systems or in supporting hardware might require new versions, but much the same upgrade issue exists for OSes as for programming languages, and letting old systems “just run” is the same option in both cases).
One issue to consider when moving to the latest version of Python is simply what version of the language comes preinstalled with your operating system, if any. Microsoft Windows® does not ship with any real built-in developer tools or languages. Of course, Pythons—in various versions and distributions—are free to download from python.org and other sites (as are many other programming languages and developer tools). Apple OS X® ships with Python 2.x preinstalled (in all recent versions; their future plans are confidential and presumably strategic), so while freely available, installing Python 3.x takes at least some extra effort. Linux® distributions have traditionally shipped with Python 2.x installed, but increasingly the latest versions of these “distros” use Python 3.x for all of their internal tooling, ship with Python 3.x on their default media (either exclusively or with both Python 3.x and Python 2.x included), and generally encourage the use of Python 3.x. Of course, for many reasons similar to those discussed above, upgrading existing hardware systems from older operating systems is itself an effort, and poses risks of breaking existing programs and workflows.
PEP 394, entitled The “python” Command on Unix-Like Systems, specifies the recommended configuration of Python on Unix-like systems. This includes Linux distros, of course. It equally applies to BSD-family Unix systems, such as FreeBSD®, OpenBSD™, NetBSD™, and even to Apple OS X. Of course, while the Python Software Foundation can make a recommendation via a PEP, no other entity is bound to follow such recommendations (some do, some don’t). The basic purpose of this recommendation is to govern the behavior of files that use the “shebang convention”—that is, files that look at the first few bytes of a file to see if they are an executable script by seeing if they look like:
Or often indirectly as:
In quick summary, PEP 394 recommends that within an installed operating system environment:
python2will refer to some version of Python 2.x.
python3will refer to some version of Python 3.x.
pythonwill refer to the same target as
pythonwill refer to the same target as
- Python 2.x-only scripts should either be updated to be source compatible with Python 3.x or use
python2in the shebang line.
While some currently shipping systems like Apple OS X only ship with Python 2.x, others like Arch Linux™ ship with
python aliased to
python3 already. In (almost) all cases, explicitly specifying
python3 in the shebang line will resolve any ambiguity.
Python 3 on Fedora and Red Hat
Major Linux distributions generally follow the recommendation of PEP 394, and furthermore, are moving at a consistent pace towards general internal use of Python 3.x. For example, Fedora’s wiki documents this effort:
The main goal is switching to Python 3 as a default, in which state:
- DNF is the default package manager instead of Yum, which only works with Python 2
- Python 3 is the only Python implementation in the minimal buildroot
- Python 3 is the only Python implementation on the LiveCD
- Anaconda and all of its dependencies run on Python 3
- cloud-init and all of its dependencies run on Python 3
Python 3 on Ubuntu
Ubuntu is following a similar path vis-à-vis Python versioning as is Fedora (and Red Hat). Ubuntu’s wiki describes their goals:
For both Ubuntu and Debian, we have ongoing project goals to make Python 3 the default, preferred Python version in the distros. This means:
- Python 3 will be the only Python version installed by default. Python 3 will be the only Python version in any installation media (i.e. image ISOs)
- Only Python 3 will be allowed on the Ubuntu touch images.
- All upstream libraries that support Python 3 will have their Python 3 version available in the archive.
- All applications that run under Python 3 will use Python 3 by default.
- All system scripts in the archive will use Python 3.
Ubuntu 14.04 LTS has recently been released. We made great progress toward these goals, but we must acknowledge that it is a daunting, multi-cycle process. A top goal for 14.04 was to remove Python 2 from the touch images, and sadly we almost but didn’t quite make it. There were still a few autopilot tests for which the Python 3 ports did not land in time, thus keeping Python 2 autopilot support on the base touch image. This work is being completed for Utopic and we expect to remove Python 2 from the touch images early in the 14.10 cycle (actually, any day now).
Python 3 Uptake
It is difficult to know with any confidence just how widely used Python 3.x is compared to Python 2.x. Indeed, it is not easy to know how widely used Python is in general, either in absolute terms or compared to other programming languages. For that matter, there are many meanings one could give to “how widely used” to begin with: how many local applications? How many servers? Serving how many clients? How much CPU time used in the process? How important are the various applications? How many lines of code in the version? And so on.
Certainly Python in general is near the top of popular programming languages if one looks at indices like the TIOBE Programming Community Index, the Transparent Language Popularity Index, the PYPL PopularitY of Programming Language, IEEE Spectrum’s 2014 Ranking, or The RedMonk Programming Language Rankings.
Being active in the Python community, and also being a director of the Python Software Foundation, this writer has some access to some rough indicators of Python version usage that while not confidential, also haven’t been widely published (mostly because of their lack of statistical rigor). But a few points are suggestive.
Downloads from python.org
On the python.org website itself, downloads of 3.x versions started outnumbering downloads of 2.x versions beginning in early 2013. As discussed above, many operating system distributions come with Python versions installed, so those probably do not need to be downloaded from python.org. Moreover, one download from python.org might result in anywhere from zero to tens of thousands of installs on individual machines at companies. And furthermore, other sites are free to, and many do, mirror Python archives, so not all downloads are via python.org, even initially. Adding to that, and discussed later in this paper, various third parties have created their own Python distributions that include various other sets of “batteries included” beyond what the distributions provided by the PSF do. So the indicator mentioned is very rough, but a positive suggestion at least. (Figure 1-2 shows the python.org download menu.)
Downloads of third-party software and libraries from the Python Package Index (PyPI) have the same caveats as those about the language itself. Many downloads from PyPI are automated ones done by
easy_install, or other automated installers—including many that are not Python-specific, such as
brew (albeit, probably none of the non-Python installers listed directly access PyPI, but use their own archives instead). Many downloads are also done as part of automated testing of configuration as well, which may create inertia in repeatedly downloading older packages within such automated scripts.
Downloads from the Python Package Index
In any case, the internal logs for PyPI suggest that downloads of Python 3.x-specific packages remain below 25% of the downloads from the site. It’s hard to know the exact reasons—a positive thought is that this might be partially because Python 3.x is even more “batteries included” in the basic distribution than 2.x was, and hence there is less need for third-party tools. However, likely that explanation is overly Panglossian, and the new modules in Python 3.x have only a small effect on the demand and use of third-party libraries and tools. (Figure 1-3 shows PyPI’s navigation screen.)
A 2013/2014 Survey
A survey was completed at the beginning of 2014 to gauge relative usage, based on responses from postings on comp.lang.python, python-dev, and hacker news. While still unscientific, the 2.x-vs-3.x-survey might be of interest to readers:
|Have you ever written code in Python 2.x?
|Have you ever written code in Python 3.x?
|Do you currently write more code in Python 2.x than 3.x?
|Do you think Python 3.x was a mistake?
|Do you have dependencies keeping you on Python 2.x?
|Have you ever ported code from Python 2.x to Python 3.x?
|Have you ever written/ported code using
|Have you ever written/ported code using
|[…] code to run on Python 2.x and Python 3.x unmodified?
Is It Enough?
There seems to be an ongoing perception in at least parts of the Python community that projects are “stuck” on Python 2.x, whether either widely used libraries or in-house codebases. This perception is mostly false when it comes to widely used third-party libraries: the large majority of the most important FLOSS support libraries have Python 3.x-compatible versions today. Because some of those library versions have been created relatively recently, perceptions of the “ecosystem” not having moved to Python 3.x may simply reflect infrequent review by developers of the overall snapshot of porting statuses (it is a not inconsiderable project to conduct such a review as it applies to one’s own large codebase).
Adoption of Python 3.x was also slowed somewhat by missteps in Python 3.0 that were not fixed until Python 3.1. So there was not a really good and stable Python 3.x version until mid-2009, in truth. For applications that work intensively with text processing, the benchmark version is probably even Python 3.3, because of the improvements in PEP 393 – Flexible String Representation. That release happened in September 2012. The variable-width storage of unicode strings made for a big win in memory allocation and usage:
The memory usage of Python 3.3 is two to three times smaller than Python 3.2, and a little bit better than Python 2.7, on a Django benchmark (see the PEP for details).
More than the details of what improvements arrived on what dates in the history of Python 3.x, what probably feeds many developers’ sense of being “stuck” is not any concrete large conceptual or infrastructure problem in porting, but simply the fact that it takes more work to change versions—on a short-term basis—than it does to leave things as they are within a large codebase.
Even adding some minor functionality, bug fix, kludge, or workaround to a large, in-house, Python 2.x codebase is less work today than is porting (and more importantly, testing and validating that port) to Python 3.x. The next problem would often have been solved by the port, or at least much easier to address after it is made. But today’s work is done now, and the next problem not addressed until later. In corporations, profits and expenses are accounted for quarterly; and even open source projects are also often constrained by what is possible—or rewarding to volunteer developers—immediately rather than what makes things better in the long term.
This writer’s take on migration to Python 3.x is that:
- Overall, the migration is inevitable.
- Some projects or long-running processes will make a decision (often a reasonable one) to stick with what they know is stable and works for their specific purpose until the code itself fades from relevance.
- Migration is moving at a reasonable and steady pace, even if slightly more slowly than I’d like to have seen.
- The release schedules and end-of-life dates are well timed for a gradual and successful transition.
- In some ways, Python’s conservative philosophy of compatibility and stability pushes against migration. Python 2.x has a long maintenance period, and while the improvements in Python 3.x really are great, intentionally none are revolutionary or fundamental. Python got most things right from the start, and those right choices haven’t been changed in Python 3.x. There is no new paradigm here, just new capabilities.
- The sky isn’t falling, and there still isn’t ever going to be a Python 2.8 (and neither are users going to abandon Python in droves for some other language because of the minor transition difficulties from Python 2.x to Python 3.x).
Python 3.5 Release Schedule
The current plan of record, documented in PEP 478, entitled Python 3.5 Release Schedule, sets the release date of Python 3.5.0 final as September 13, 2015. As with other minor versions, Python 3.5 will add several interesting and useful features. In general, this paper takes a modestly forward-looking perspective. Where advantages are discussed later, they extend through Python 3.5 but will not generally differentiate exactly when in the Python 3.x series a feature was introduced.
1Fred Brooks, The Mythical Man-Month. Addison-Wesley, 1975.