Tern LogoTern
← Back to all posts

Ratcheting Progress: How Lyft Migration 150+ Services from Python 2 to 3

TR Jordan
Ratcheting Progress: How Lyft Migration 150+ Services from Python 2 to 3

These are the highlights from an episode of Tern Stories. You can watch the full conversation with Anthony Sottile on YouTube, Spotify, Apple, or wherever you get your podcasts.

What does it take to migrate 150 services to Python 3? At Lyft, the answer wasn’t just automation. It was structured progress at scale—an approach built to handle messiness, move quickly, and never lose ground.

Progress Should Be a Ratchet

Anthony Sottile, who helped lead Lyft’s migration off Python 2, described a process grounded in momentum. Every service moved through the same core loop:

  1. Make the code parse in Python 3
  2. Make the code lint
  3. Make the tests pass in both Python 2 and 3
  4. Deploy

At each step, once a milestone was hit, a new guardrail was added: CI checks enforced the newly achieved state. If syntax passed, syntax checks were added to CI. If linting passed, linters were enforced. Once tests ran in both versions, test suites were configured accordingly.

This meant services could live in an in-between state: fully functional in Python 2, and incrementally compatible with Python 3. That intermediate compatibility was critical. It allowed progress to be shipped safely and continuously, rather than waiting for a single, high-risk switchover.

We baked it into the CI setup for that service. Your syntax is passing? Great. We have a syntax checker now in your CI.

This ratchet-based approach meant that every win was permanent—progress couldn’t slide backward. You could ship at every stage—and most of the time, that was enough. According to Anthony, 80–85% of services were fully migrated through automation alone, no manual intervention required.

Developers Speak PRs

The migration wasn’t tracked in spreadsheets. Refactorator, Lyft’s internal migration engine, used Google Sheets as a database. But what mattered most was that the unit of work was a pull request.

Refactorator worked like this:

  1. Cloned every repo
  2. Applied automated changes
  3. Opened a pull request
  4. Recorded the result in a spreadsheet

If a PR passed CI and no one objected after a week, it was self-approved and merged. Engineers didn’t need to learn a new tool or system. If you understood GitHub, you understood the migration.

This minimized coordination overhead. There was no need to teach teams how to participate. They didn’t need a ticket or a meeting. Just a PR.

Build Infrastructure That Lets You Ship

Lyft didn’t build complex traffic splitting or dual-version runtimes to compare Python 2 and 3—they didn’t let fear dictate complexity.

Why? Because their production systems gave them confidence.

Every deploy at Lyft went through a canary process—a small percentage of traffic hit the new code first. If something went wrong, it affected only a limited slice of requests. Most mid-tier services were expected to fail occasionally—and designed to handle it. Callers would automatically retry requests, skip degraded dependencies, or fall back to cached data.

A Python 2 rollout was no different than any other feature rollout. If you rolled out broken code to production, it was mitigated by canary and retry mechanisms.

Even the worst-case deploy—one service required 12 rollbacks—fit within the system’s design. Rollbacks were fast. Tiering policies made clear which failures impacted users and which could be safely ignored. The result: deployment errors stayed contained, and rollout remained a routine part of shipping.

In theory, they could have deployed Python 2 and 3 side by side to compare resource usage and behavior. But the existing tooling—canary deploys, automated rollbacks, retry mechanisms, and tiered service boundaries—already provided enough signal. They chose not to add complexity where their systems already gave them answers.

Migrations Are a Systems Problem

The story of Lyft’s Python 3 migration isn’t just a success story about automation. It’s a case study in what infrastructure and culture make possible.

  • Guardrails turned incremental progress into irreversible progress.
  • GitHub PRs made the migration legible to everyone.
  • Canaries and retries let the team ship fearlessly.

The lesson? Migration doesn’t need to be heroic. With the right tools and automation strategy, you can move safely and continuously—even across hundreds of services.

If you want to learn more from Anthony Sottile, you can find him on YouTube and Twitch, where he shares deep dives into Python internals, tooling, and developer workflows.

Watch the Full Episode

Never miss a post.