Transitioning to Python 3

The Python language, which is not new but continues to gain momentum and users as if it were, has changed remarkably little since it first was released. I don't mean to say that Python hasn't changed; it has grown, gaining functionality and speed, and it's now a hot language in a variety of domains, from data science to test automation to education. But, those who last used Python 15 or 20 years ago would feel that the latest versions of the language are a natural extension and evolution of what they already know.

At the same time, changes to the language—and particularly changes made in Python 3.x—mean that Python 2 programs won't run unmodified in Python 3. This is a known issue, and it was part of the process that Python's BDFL (Benevolent Dictator for Life) Guido van Rossum announced back when the "Python 3000" project was launched years ago. Guido expected it would take time for organizations to move from Python 2 to Python 3, but he also felt that the improvements to the language were necessary.

The good news is that Python 3, which at the time of this writing exists in version 3.5, is indeed better than Python 2. The bad news is that there still are a lot of companies (including many of my training and consulting clients) that still use Python 2.

Why don't they just upgrade? For the most part, it's because the time and effort needed to do so aren't seen as a worthwhile investment of developer resources. Most differences between Python 2 and 3 are easily expressed and understood by people, but the upgrades aren't completely automatic. Moving a large code base from Python 2 to 3 might take days, but it also might take weeks or months.

That said, companies will soon be forced to upgrade, because as of the year 2020, there will be no more support for Python 2. That's a risk many companies aren't going to want to take.

If you have to upgrade, but can't upgrade, that puts you in a terrible spot. However, there is another option: upgrade incrementally, modifying just 1–2 files each week so that they work with both Python 2 and 3. After a number of months of such incremental changes, you'll be able to switch completely to Python 3 with relatively little investment.

How can you make your code compatible with both? In this article, I provide a number of suggestions on how to do this, using both an understanding of Python 3's changes and the tools that have been developed to make this transition easier. Don't wait until 2019 to start making these changes; if you're a Python developer, you already (in mid-2016) should be thinking about how to change your code to be Python 3-compatible.

What Has Changed?

The first thing to ask is this: what exactly changed in Python 3? And, how easily can you move from Python 2 to Python 3? Or, how can you modify your Python 2 programs so they'll continue to work in Python 2, but then also work unmodified in Python 3? This last question is probably the most important one for my clients, and possibly for your business as well, during this transition period.

On the face of things, not very much actually changed in Python 3. It's a cleaner, more efficient and modern language that works like more modern Python developers want and expect. Things that Python developers were doing for years, but that weren't defaults in the language, are now indeed defaults. Sure, there are things I'm still getting used to after years of bad habits, such as failing to use parentheses around the arguments passed to print, but on the whole, the language has stayed the same.

However, this doesn't mean that nothing has changed or that you can get away with not changing your code.

For example, you almost certainly never wanted to use Python 2's input built-in function to get user input. Rather, you wanted to use the raw_input built-in function. So in Python 3, there is no equivalent to Python 2's input; the Python 3 input function is the same as Python 2's raw_input.

A more profound change is the switch in the behavior of strings. No longer do strings contain bytes; now they contain Unicode characters, encoded using UTF-8. If 100% of your work uses ASCII, you're in luck; nothing in your programs will really need to change. But if you use non-ASCII characters, and if you do so in the same program as you work with the contents of binary files, you'll have to make some adjustments. Python 2's str class is now a bytes class, and Python 2's unicode class is now the str class.

A number of other changes have been made that make Python more efficient. For example, Python 2 has the range function (which returns a list of integers) and the xrange function (which returns an iterator). Python 3's range function is the same as Python 2's xrange, because it's so much more efficient, and there really are few reasons to prefer the old range. But if your program expects to get a list back from range, you might be in trouble when you move to Python 3.

Another problem, which has become far less acute in the last year or two, is that of third-party libraries. If you're using packages from PyPI, you need to make sure not only that your own code works with Python 3, but also that all of those packages do. For a long time, I would argue that these packages were the bottleneck stopping many people from upgrading. But nowadays, most popular packages support Python 3, as you can see at the Python 3 Readiness site, which tracks such information.

Identifying Problems

So, how can you take a Python 2 program and modify it so that it'll work under both Python 2 and 3? You could go through the code line by line and try to find changes, but there are tools that can make the process much easier.

The first is an old friend of Python developers, the pylint program, which normally checks your code for Python style and usage. Modern versions of pylint have a py3k option you can apply that checks your code to see how compatible it is with Python 3. For example, let's assume you have written the (terrible) program shown in Listing 1. How can you find out which parts of it aren't going to work? You can run this:


pylint --py3k oldstuff.py

And, you'll get the following output:


************* Module oldstuff
W:  3, 7: raw_input built-in referenced (raw_input-builtin)
E:  4, 0: print statement used (print-statement)
E:  5, 0: print statement used (print-statement)
E:  6, 0: print statement used (print-statement)
W:  8, 9: raw_input built-in referenced (raw_input-builtin)
E: 10, 4: print statement used (print-statement)
W: 10,48: division w/o __future__ statement (old-division)
E: 14, 4: print statement used (print-statement)
W: 16, 4: range built-in referenced when not iterating
 ↪(range-builtin-not-iterating)
E: 17, 0: print statement used (print-statement)

The output contains both errors ("E") and warnings ("W"). The example program is using print as a statement, rather than a function. It's using range when not iterating. And, it's using raw_input. What can you do about it, and how can you improve things? pylint won't tell you; that's not its job. But if nothing else, you now have a list of things to fix and improve, so that it'll at least run under Python 3.

Listing 1. oldstuff.py


#!/usr/bin/env python

name = raw_input("Enter your name: ")
print "Hello, ",
print name,
print "!"

number = raw_input("Enter a number: ")
for i in [2,3,5]:
    print "{} / {} = {}".format(int(number), i, int(number) / i)


for i in range(10):
    print i

x = range(10)
print x[3]

If you have written a Python package with a requirements file, you can download and install caniusepython3 from PyPI. Running caniusepython3 against your requirements file will indicate what will work and what won't. If you don't want to download and install caniusepython3, you actually can go to the Can I Use Python 3 site and upload your requirements file there.

Fixing Problems

Python has come with a program called 2to3 for some time that looks over your Python 2 code and tries to find ways to make it work with Python 3. So, you can run:


2to3 oldstuff.py




and get unified diff-style output, indicating what changes you'll need to make in order for your program to work under Python 3. The problem is that this is a one-way conversion. It tells you how to change your program so it'll work with Python 3, but it doesn't help you make your program compatible with both 2 and 3 simultaneously.

Fortunately, there's a package on PyPI called futurize that not only runs 2to3, but also provides the import statements necessary for your code to run under both versions. You can just run:


futurize oldstuff.py

and the output is (as with 2to3) in diff format, so you can use it either to create a file that's compatible with both or to read through things.

What if you have Python 3 code and want to make it backward-compatible with Python 2? The same people who make futurize also make the amusingly named pasteurize, which inserts the appropriate import statements into code.

How do you know if your code really works well under both Python 2 and 3 after you have applied futurize's changes? You can't, and there is no doubt that these automatic tools will get some things wrong. For this reason (among others), it's crucial that you have a good test suite, with good coverage of your Python 2 code. Then you can run your tests against the Python 3 version and ensure that it works correctly there as well. Without these tests, you shouldn't think that your upgrade has worked; even 100% test coverage is never a guarantee, but it at least can tell you that the risk of failure has been minimized.

What if you're doing all sorts of serious and deep things with Python 2 that 2to3 can't notice, or that you can't paper over? A great package on PyPI is six, which papers over the differences between Python 2 and 3. For example, let's say you want to create a new object of the type used for text, such that things will be compatible across versions. In Python 2, that's going to be unicode, but in Python 3, that's going to be str. You don't want to have an "if" statement in your code each time you do this. Thus, using six, you can say:


import six
s = six.text_type()

Now you can be sure that "s" is an object of the appropriate type.

six defines an amazing array of things that have changed, which you might need to keep track of in your code. Want to check something in the builtins namespace (aka __builtins__ in Python 2)? Want to re-raise exceptions? Want to use StringIO (or BytesIO)? Want to deal with metaclasses? Using six, you can write a single line of code, which behind the scenes will issue the appropriate "if" statements for the version of Python you're using.

Even if you don't use six in your code, I recommend that you read through its documentation just to see where things have changed in Python 3. It'll open your eyes (as it did to mine) regarding the behind-the-scenes changes that often aren't discussed in the Python 2/3 world, and it might give you more insights into how to write your code so that it can work in both.

Conclusion

If you're starting to write some new Python code today, you should use Python 3. And if you have Python 2 code that you can upgrade to Python 3, you should do that as well. But if you're like most companies with an existing Python 2 code base, your best option might well be to upgrade incrementally, which means having code that works under 2 and 3 simultaneously. Once you've converted all of your code, and it passes tests under both 2 and 3, you can flip the switch, joining the world of Python 3 and all of its goodness.

Resources

Much has been written about the changes in Python 2 and 3. A great collection of such information is at the Python-Future website. That site offers the futurize and pasteurize packages as well as a great deal of documentation describing the changes between versions, techniques for upgrading and things to watch out for.

The six package is documented here. Even if you don't use six for 2/3 compatibility, I strongly suggest that you look through its capabilities.

Finally, if you're a web developer using Django, you definitely should read the Django-specific documentation regarding moving to Python 3 here.

This is especially important because of Django's handling of strings, bytes and Unicode strings, the names of which changed a bit over the years. Django actually includes a copy of the six library, modified slightly to suit its needs for internal use.

Reuven Lerner teaches Python, data science and Git to companies around the world. You can subscribe to his free, weekly "better developers" e-mail list, and learn from his books and courses at https://lerner.co.il. Reuven lives with his wife and children in Modi'in, Israel.

Load Disqus comments