Moving Safely to Python 3

You may have heard the news – Python 2 reaches end-of-life on January 1st, 2020. This means Python 2 will no longer be maintained after this date. Ongoing development, security updates, and bug fixes by language maintainers will stop for Python 2 and focus solely on Python 3. This means that running software on Python 2 will become significantly riskier in terms of availability, security, and maintenance.

The reasons for this approach by the language maintainers are multifold. They include better string handling, more consistent naming for various standard library functions, more intuitive handling of numbers/division, and much more. A number of resources have been made available to assist with upgrading Python 2 code to Python 3, including a porting guide, the 2to3 tool, and python-future.

Although we’re still actively working on this project, we wanted to give an overview of our approach, which might be useful if your team is also working on upgrading to Python 3!

Rover’s Approach

Rover’s primary web application contains more than 600,000 lines of Python code being worked on by almost 100 developers on a daily basis, with changes deployed 15-30 times per day (including over US nighttime, as we have developers located globally). Our web application also depends on more than 200 third party libraries. In such an environment, we could not move our entire codebase to Python 3 in one fell swoop. We had to take a careful, cautious approach to minimize disruptions to our operations and prevent customer impact. In particular, we decided to impose the following constraints on our effort:

Zero downtime. Downtime is dangerous, expensive in terms of developer involvement, and disruptive to customers.
Minimize developer disruption. We assigned a small team to work on this effort, and as much as possible we did not want developers across our organization to have to slow down their critical feature development to support this project.
Minimize production bugs. To avoid business and customer impact we wanted to focus as much as possible on not introducing bugs to production as part of this effort.

We decided to target Python 3.6 because most of our dependencies released versions that explicitly supported Python 3.6, but very few explicitly supported Python 3.7. In line with our extra-cautious approach, Python 3.6 seemed like the safest bet to ensure the project would be completed on time and with minimal issues.

Overview

There are basically two approaches to getting a “24/7 live web application” to Python 3: a one-time cutover to a Python 3 version of the code, or a “backwards compatible” phase with a gradual rollout. Given our constraints, we decided to go with the gradual, phased rollout approach which would be safer and allow us to more easily deal with any operational issues we might encounter.

A gradual rollout would mean that our Python code would need to simultaneously support Python 2 and Python 3. Thus, we organized the Python 2 to 3 project into four milestones: upgrade all our dependencies to Python 3.6 compatible versions; make all web application code compatible with both Python 2 and Python 3; get all unit tests passing under Python 2 and Python 3; and finally roll out Python 3 to all servers. The first two milestones could be worked in parallel, while the final two could be started as soon as all the code supported both Python 2 and Python 3.

Milestone 1: All Dependencies Support Python 3.6

The goal of this milestone was to ensure all dependencies (third party and those maintained by Rover) explicitly supported Python 3.6. We have over 200 dependencies on both third party libraries as well as Rover-maintained libraries, and many of them have not been upgraded in months or years.

The first step was to conduct a full audit of all our dependencies to determine if a dependency currently supported Python 3.6; if a newer release supported Python 3.6; or if more investigation would be needed to determine Python 3.6 support. Following our philosophical approach, we considered a package to support Python 3.6 only if it declared support in its package metadata or ran unit tests against Python 3.6 as part of automated continuous integration (such as TravisCI).

We found that a large majority of our packages either had a newer version that explicitly supported Python 3.6 or a newer version that explicitly supported some Python 3 version. For the packages that explicitly supported Python 3.6 in a newer version, we scheduled work to conduct the package upgrade; otherwise we attempted to confirm Python 3.6 compatibility by manually running unit tests for the package against Python 3.6. A handful of packages required additional investigation or thought; e.g. in some cases we depended on functionality from an unmaintained package that we could easily replace inside our primary web application.

Regression Testing with Test.IO

Philosophically a package upgrade can be a dangerous production change, especially if that package is widely used (such as packages providing templates/user interface functionality). To help us prevent regressions and bugs making it to production for such widely used packages, we turned to a service called Test.IO.

Test.IO provides QA testing as a service, with quick turnaround and comprehensive coverage options. The service allows you to set up proxy access to an environment you set up running your application and takes care of “crowdsourcing” QA testing by finding testers around the world to exercise its various features. Crowdsourced QA is a useful approach because it more closely simulates “real users” who often find bugs that might be missed by internal testing due to bias introduced by e.g. a priori knowledge of the product.

We’ve made use of Test.IO to conduct broadly scoped coverage tests of our web application as part of our dependency upgrade process for particularly widely used packages. Beyond the level of assurance provided by running these changes through QA tests, the Test.IO testers have frequently found impactful pre-existing production bugs. Furthermore, thinking carefully about the feature descriptions we handed off to Test.IO testers to exercise functionality resulted in us having a deeper understanding of our own product and achieving more widespread coverage of our large, complex web application.

Milestone 2: All Rover Code Supports Python 2 and Python 3

While working through auditing and upgrading our numerous dependencies, we’ve also been working on getting all of our web application code compatible with both Python 2 and Python 3. Having code that simultaneously supports both versions of Python makes it significantly easier and safer to plan and execute our phased rollout of Python 3 to production.

To uplift code to be Python 3 syntax compatible, we’ve made use of the futurize tool provided by python-future to automate most of the process. We basically apply the fixes generated by the futurize tool, manually fix up any inconsistencies, ensure that unit tests pass, and ship the change.

There are two primary challenges we’ve faced throughout this process:

How do we reasonably split up the codebase into discrete chunks of work, to avoid shipping too many potentially dangerous changes at once?
How do we prevent non-Python 3 syntax compatible code from being introduced once we’ve uplifted a particular file?

Splitting Up the Code

Rover’s primary web application is a Django application. Django’s pattern is to package related parts of the code together into “sub-applications” (think subdirectories within the broader application structure). While in theory these sub-applications would be mostly self-contained, in practice this is rarely the case; many cross-dependencies exist. However, sub-application was a reasonable granularity at least to begin the process.

Rover’s web application contains more than 80 sub-applications with wildly varying amounts of files and code. As you can see below, the “size” of each sub-application in terms of lines of Python code is not evenly distributed across our codebase. Roughly 50% of our code is contained in only 10% of our sub-applications. Because of this distribution, we decided to start with our smallest sub-applications and work our way towards the largest ones. This helped us identify quirks and gotchas in the process, document any workarounds and best practices, and be more well-equipped for tackling our larger sub-applications.

Lines of Code per Application

Preventing Incompatible Code from Being Reintroduced

Our approach was to run our codebase through the futurize tool, fix any failing tests, and then merge our changes for each sub-application, at which point that sub-application would be considered Python 2 and Python 3 syntax compatible. However, with such a large code base actively being worked on by 100 developers, we needed to be sure that no regressions would be introduced after completing a sub-application.

In order to accomplish this, we added a step to our continuous integration pipelines that would ensure no Python 3-incompatible syntax would be introduced to any sub-applications we had completed. Whenever a pull request was opened to merge changes into our codebase, we would ensure that futurize suggested no changes to the diff. Furthermore, we ran pylint using --py3k on changes to further ensure Python 3 compatibility.

For the most part this has been minimally disruptive to other developers not actively working on the Python 2 to 3 project. We dealt with any confusion to developers (such as unexpectedly failing continuous integration on pull requests) by explaining and documenting the process, and ensuring we were quickly responsive to any questions that arose.

Milestone 3: Unit Tests Passing Under Python 2 and Python 3

Upon completion of the first two milestones, we will begin working on getting unit tests running under Python 3. I should note that as of this writing, we have not yet completed the first two milestones and thus have yet started on this milestone; however, our plan remains to tackle this milestone as soon as our code and dependencies are compatible with Python 3.6.

From Rover’s beginning, high unit test coverage has been a priority and we believe this will be a significant factor to successfully rolling out Python 3. Our unit test suite helps gives us high confidence that introducing large-scale changes such as our Python 3 uplift does not break any functionality or introduce regressions. Getting tests passing under Python 3 will be a major milestone and unblock our rollout. At the end of this milestone we’ll be running our unit test suite under both Python 2 and Python 3.

Milestone 4: Python 3 Rollout

The last step before actually conducting our rollout is to run extensive manual testing of our application under Python 3 in a staging environment. For this phase we also plan on taking advantage of Test.IO’s capabilities to run site-wide QA regression testing (see above). This will help identify any possible bugs that were not covered by our unit test suite.

While we haven’t formally established a rollout plan yet, we have an idea of our approach. In keeping with our goal of zero downtime and minimal impact, we will perform a slow, phased rollout. We use a number of different application server pools to serve requests to different parts of our application. We will rollout Python 3 to a single server in each application pool, carefully monitoring for any errors, request timeouts, or other unexpected behavior. After confirming no functional regressions, we will slowly rollout Python 3 to each server in each application pool, one by one, until our whole application is running under Python 3.

The purpose of this phased rollout is to minimize “blast radius,” or the maximum possible scope of impact to our customers. For example, imagine that our Python 3 changes broke a certain endpoint in our application. By rolling out the change to only a single server, we will only experience request failures from any requests being served by that particular server. That minimizes the number of customers impacted and prevents issues from becoming too widespread. This manual approach requires additional effort, but helps us achieve our goals while adhering to our philosophical approach of zero downtime and minimal business and customer impact.

Where Are We Now?

As of this writing (April 2019) we are currently most of the way through the first two milestones. So far our approach has been successful. We’ve been able to move steadily and deliberately through our dependencies and our codebase with minimal disruption to our developers and very few hiccups in the process. We hope to begin our work to get unit tests passing in Python 3 soon, and we’re eager to begin our rollout! We’ll revisit this process and post an updated blog as we get closer to being fully Python 3.