« Continued from Python Versions

Porting

Several tools are available to somewhat automate porting of code from Python 2.x to Python 3.x. Using any of these will generally require some manual developer attention during the process, but in many cases a fairly small degree of such attention.

There are two approaches to consider when porting code:

  • Is this a one time transition? For in-house code, it is almost certainly a better idea to make a clean break and commit to the Python 3.x series (after sufficient testing, code review, etc.). Even FLOSS libraries may want to make the jump as well, perhaps maintaining parallel versions for a certain period of time (e.g., a Foolib and Foolib3 might both get updated for a while).

  • Do you want to instead create a version-neutral codebase that will run on both Python 2.x and Python 3.x? It is possible to use compatibility shims to write such code while still fairly elegantly falling back or falling forward with differences in features. However, using these shims will probably require writing some code in a manner less than the most idiomatic style for a Python series, at least at times.

Basically, if you want to make a clean break, use the tool 2to3 which comes bundled with recent Python versions, in both series. If you want to create version-neutral code, use the support libraries six or Python-Futures. There was, for a while, a short-lived project called 3to2 to backport code already written for Python 3.x; however, it has not been well maintained, and using it is not recommended.

2to3

The documentation for 2to3 - Automated Python 2 to 3 code translation describes it as follows:

2to3 is a Python program that reads Python 2.x source code and applies a series of fixers to transform it into valid Python 3.x code. The standard library contains a rich set of fixers that will handle almost all code. [The] 2to3 supporting library lib2to3 is, however, a flexible and generic library, so it is possible to write your own fixers for 2to3. lib2to3 could also be adapted to custom applications in which Python code needs to be edited automatically.

As a little experiment for this paper, I decided to take a look at my long out-of-maintenance library Gnosis_Utils. I do not particularly recommend using any part of this library, since most of the useful things it did have been superceded by newer libraries. For the most part, the library was always a teaching tool, and a way of publishing examples and concepts that I discussed in my long-running column for IBM developerWorks, Charming Python (albeit, I was pleased to hear from many correspondents who used parts of it in real production systems, and to know it was included with some Linux and BSD distributions). In particular, this library stopped being maintained in 2006, and was probably last run by me using Python 2.5.

However, because of its age, it might make a good experiment in porting. In its simplest mode, running 2to3 simply proposes a diff to use in updating files. There are switches to perform the actions automatically, and some others to limit which transformations are performed. Let us look at what the tool might do in default mode:

% 2to3 Gnosis_Utils-1.2.2/gnosis/trigramlib.py
RefactoringTool: Skipping implicit fixer: buffer
RefactoringTool: Skipping implicit fixer: idioms
RefactoringTool: Skipping implicit fixer: set_literal
RefactoringTool: Skipping implicit fixer: ws_comma
RefactoringTool: Refactored Gnosis_Utils-1.2.2/gnosis/trigramlib.py
--- Gnosis_Utils-1.2.2/gnosis/trigramlib.py (original)
+++ Gnosis_Utils-1.2.2/gnosis/trigramlib.py (refactored)
@@ -1,6 +1,6 @@
"Support functions for trigram analysis"
-from __future__ import generators
-import string, cPickle
+
+import string, pickle

def simplify(text):
    ident = [chr(x) for x in range(256)]
@@ -17,13 +17,13 @@
    return " ".join(ts.text_splitter(text, casesensitive=1))

def simplify_null(text):
-    ident = ''.join(map(chr, range(256)))
+    ident = ''.join(map(chr, list(range(256))))
    return text.translate(ident, '\n\r')

def generate_trigrams(text, simplify=simplify):
    "Iterator on trigrams in (simplified) text"
    text = simplify(text)
-    for i in xrange(len(text)-3):
+    for i in range(len(text)-3):
        yield text[i:i+3]

def read_trigrams(fname):
@@ -31,7 +31,7 @@
        trigrams = {}
        for line in open(fname):
            trigram = line[:3]
-            spam,good = map(lambda s: int(s,16), line[3:].split(':'))
+            spam,good = [int(s,16) for s in line[3:].split(':')]
            trigrams[trigram] = [spam,good]
        return trigrams
    except IOError:
@@ -39,25 +39,25 @@

def write_trigrams(trigrams, fname):
    fh = open(fname,'w')
-    for trigram,(spam,good) in trigrams.items():
-        print >> fh, '%s%x:%x' %(trigram,spam,good)
+    for trigram,(spam,good) in list(trigrams.items()):
+        print('%s%x:%x' %(trigram,spam,good), file=fh)
    fh.close()

def interesting(rebuild=0):
    "Identify the interesting trigrams"
    if not rebuild:
        try:
-            return cPickle.load(open('interesting-trigrams','rb'))
+            return pickle.load(open('interesting-trigrams','rb'))
        except IOError:
            pass
    trigrams = read_trigrams('trigrams')
    interesting = {}
-    for trigram,(spam,good) in trigrams.items():
+    for trigram,(spam,good) in list(trigrams.items()):
        ratio = float(spam)/(spam+good)
        if spam+good >= 10:
            if ratio < 0.05 or ratio > 0.95:
                interesting[trigram] = ratio
-    cPickle.dump(interesting, open('interesting-trigrams','wb'), 1)
+    pickle.dump(interesting, open('interesting-trigrams','wb'), 1)
    return interesting

RefactoringTool: Files that need to be modified:
RefactoringTool: Gnosis_Utils-1.2.2/gnosis/trigramlib.py

Everything suggested here will produce valid Python 3.x code, and applying the diff is completely sufficient to do so. But a few of the suggestions are neither idiomatic nor necessary. For example:

- ident = ''.join(map(chr, range(256)))
+ ident = ''.join(map(chr, list(range(256))))

Since the built-in function range() has become lazy (like xrange() was in Python 2.x), 2to3 conservatively shows us how to create a concrete list. In almost all contexts, lazy is not only acceptable, but even superior in performance (it doesn’t matter for range(256), but it might for range(1000000000)).

On the other hand, some of the suggestions 2to3 makes are actually to change constructs that are compatible with Python 3.x but simply not idiomatic, nor as elegant. For example:

- spam,good = map(lambda s: int(s,16), line[3:].split(':'))
+ spam,good = [int(s,16) for s in line[3:].split(':')]

The existing line will still work correctly in Python 3.x, but using a list comprehension is more idiomatic and more readable than using map(lambda s: ..., ...).

Past merely working after taking or ignoring such a list of proposed changes by an informed developer, the next step in a high-quality port is a more systematic code review. Not just "What is the syntactic way to do this?", but also "Are there standard library modules or new constructs that do this faster? Or more generally? Or more correctly? Or using less code and/or more readable code?" The answer to these questions is often yes—but then, that answer is often yes within a version series, or simply when new eyes look at old code.

Getting code simply to the level of working correctly is usually as simple as running 2to3 and applying its recommendations. Further improvements can be done incrementally, and as time and requirements permit.

six.py

The documentation for Six: Python 2 and 3 Compatibility Library describes it as follows:

Six provides simple utilities for wrapping over differences between Python 2 and Python 3. It is intended to support codebases that work on both Python 2 and 3 without modification. [S]ix consists of only one Python file, so it is painless to copy into a project.

There are a variety of constructs that differ between Python 2.x and 3.x, but it is often possible to wrap the basic functionality in a function, class, or method that abstracts away the difference (perhaps by explicitly detecting interpreter version and branching). The effect is that in version-neutral code using six.py, one frequently utilizes calls of the six.do_something(...) sort as the way of doing the close-enough-to-equivalent thing under whichever Python version the program runs in.

For example, almost certainly the most obvious difference between Python 2.x and Python 3.x—especially for beginners—is that the print statement has been replaced by the print() function. What six gives users is a function six.print_() to use everywhere. In Python 3.x, it is simply an alias for the built-in print() while in Python 2.x it reimplements the behavior of that function (with the various keyword arguments). So, for example, where 2to3 would suggest this change:

- print >>fh, '%s%x:%x' % (trigram, spam, good)
+ print('%s%x:%x' % (trigram, spam, good), file=fh)

The version-neutral change would use:

six.print_('%s%x:%x' % (trigram, spam, good), file=fh)

Python 2.7.x has itself already backported many things—where it is possible to do so without breaking compatibility. So in some cases six is useful instead for supporting even earlier versions of Python 2.x. For example, this is probably what you’d actually do if you only cared about supporting Python 2.7.x and Python 3.x:

from __future__ import print_function
import sys
print("Hello error world", file=sys.stderr, sep=" ")

Use of six.print_(), however, lets your program run even on Python 2.4. In a similar vein:

six.get_function_globals(func)
   Get the globals of func. This is equivalent to func.__globals__
   on Python 2.6+ and func.func_globals on Python 2.5.

Likewise, in an arguably more obscure case, the way metaclasses are declared changed between Python 2.x and 3.x; six abstracts that also:

import six
@six.add_metaclass(Meta)
class MyClass(object):
   pass

Which is equivalent to Python 3.x’s:

class MyClass(object, metaclass=Meta):
   pass

Or on Python 2.6+ to:

class MyClass(object):
   __metaclass__ = Meta

But to support Python 2.5 and earlier, you would have to use instead:

class MyClass(object):
   pass
MyClass = six.add_metaclass(Meta)(MyClass)

The effect, however, of writing version-neutral code using six.py is that you wind up writing code that is not particularly idiomatic for either Python 2.x or Python 3.x, but instead winds up utilizing many functions in the six module rather than native syntax or built-ins.

Python-Future

The documentation for Python-Future describes it as follows:

python-future is the missing compatibility layer between Python 2 and Python 3. It allows you to use a single, clean Python 3.x-compatible codebase to support both Python 2 and Python 3 with minimal overhead.

It provides future and past packages with backports and forward ports of features from Python 3 and 2. It also comes with futurize and pasteurize, customized 2to3-based scripts that helps you to convert either Py2 or Py3 code easily to support both Python 2 and 3 in a single clean Py3-style codebase, module by module.

Python-Future is cleaner than is six.py, but it does so in part by not attempting to support early versions within the Python 2.x series—and to some degree also ignoring early versions in the Python 3.x series. The core developers of Python have added a number of convenience in Python 2.7.x, and in Python 3.3+ to bring the two closer to compatibility. For example, Python 2.7.x allows importing from __future__ to change the behavior of the interpreter to be more like Python 3.x. For example:

from __future__ import (absolute_import,
                        division,
                        print_function)

Mind you, adding this line can—and probably will—break existing code that runs in a module that previously lacked that line; importing from the future is at least a step towards an actual 2to3 conversion (in fact, using those two techniques together is often a good idea; i.e., you can modernize your Python 2.7.x code but not yet actually move to Python 3.x).

In the other direction, PEP 414, entitled Explicit Unicode Literal for Python 3.3, added so-called unicode literals to Python 3.x. Notice that this is purely a compatibility convenience, in Python 3.3+, there is absolutely no difference in meaning between "Foobar" and u"Foobar" because all strings are unicode already. But it lets Python 2.x code that uses the unicode literals run on Python 3.3+ (obviously, assuming other features are converted or shimmed, as needed).

Using Python-Future does not preclude you from also using six.py. In fact, the Python-Future documentation recommends using the following lines at the top of modules:

import future        # pip install future
import builtins      # pip install future
import past          # pip install future
import six           # pip install six

The module builtins is especially interesting: it provides Python 2.x implementations of Python 3.x built-ins that either behave differently or simply do not exist in Python 2.7.x. So often, along with the from __future__ import ... line, a "futurized" application will contain a line like this:

from builtins import (bytes, str, open, super, range,
                     zip, round, input, int, pow, object)

Under Python 3.x, this will have no effect, but under Python 2.x, familiar functions will have enhanced behaviors. Along a similar line, the futures.standard_library submodule also makes Python 2.x more like Python 3.x:

>>> from future import standard_library
>>> help(standard_library.install_aliases)
Help on function install_aliases in module
future.standard_library:

install_aliases()
   Monkey-patches the standard library in Py2.6/7 to provide
   aliases for better Py3 compatibility.

Moreover, Python-Future combines the approaches of 2to3 with six.py in some ways, in particular, the tool futurize that comes with it does a conversion of code like 2to3 does, but the result is version neutral, and yet still generally idiomatic for Python 3.x (somewhat unlike with six.py).

In some cases, it is also possible to automatically utilize modules written for Python 2.x within Python 3.x programs without explicitly saving futurize or 2to3 (or manually) converted files first. For example:

from past import autotranslate
autotranslate(['useful_2x_only'])
import useful_2x_only

The autotranslate() function is still in alpha at the time of this writing, so (as with all changes) be sure to test after utilizing it.

Library Support

Most major libraries have been ported to Python 3.x as of the time of this writing. This paper cannot be exhaustive in listing popular libraries that already support Python 3.x, nor in discussing ones that do not have that support—there are hundreds or thousands in each category. Some "big name" libraries that are available are listed below. Of course, there remain—and indeed, always will remain, some libraries that do not get ported; if non-ported libraries are essential to the task you need to do, that is an obstacle. Of course, you might address that obstacle by:

  1. Sticking with Python 2.x.

  2. Porting the needed support library yourself.

  3. Finding a suitable substitute library covering the same general domain.

Still, among libraries already well supported in Python 3.x are the following:

  • Web frameworks

    • django

    • flask

    • bottle

    • pyramid

    • Jinja2

    • tornado

  • Numeric/scientific

    • NumPy

    • SciPy

    • pandas

    • SimPy

    • matplotlib

  • Cryptography

    • pycrypto

    • ssl

    • rsa

  • Network

    • requests

    • httplib2

    • gunicorn

    • pyzmq

    • pycurl

  • Database

    • psycopg2

    • redis

    • pymongo

    • SQLAlchemy

  • Environments

    • idle

    • IPython

    • IPython Notebook

    • virtualenv

  • Data formats

    • lxml

    • simplejson

    • anyjson

    • PyYaml

    • Sphinx

    • pyparsing

    • ply

  • Testing

    • nose

    • coverage

    • mock

    • pyflakes

    • pylint

    • pytest

    • WebTest

  • Concurrency

    • greenlet

Some likely obstacles include:

  • Twisted. Many, but definitely not all, of the capabilities in this framework have been ported. See the wiki entry for Plan/Python3 – Twisted.

  • Mechanize.

  • Scrapy.

The list here is, of course, somewhat subjective and impressionistic. The tools or libraries this writer is most familiar with are not necessarily the ones that matter most to you.

Article image: Space-Force Construction, Liubov Popova, 1921 (source: The Athenaeum).