Tern LogoTern
← Back to all posts

Why your CTO hates migrations

TR Jordan
Why your CTO hates migrations

Ryan and I met at Slack. I was working on infrastructure. He was working in product on a team that eventually would be called the Slack Common Objects Team (SCOT).

All we did was migrations.

Slack has an amazing product culture. They’re focused on how humans actually use Slack, while balancing it with the needs of enterprise admins and the wide-open possibilities of platform automation, collaboration between companies, and thoughtful, long-form communication.

But all we did was migrations.

Migrations didn’t feel like product progress. The most delightful thing (mostly) was to wake up one morning and realize some team had rebuilt Reminders, for example. Typing /remind me to do my performance review new week to Slackbot was neat, but it was life-changing when a whole UI showed up to organize them. The faster it could happen, the better.

Migrations took months or years. That was good. They were scary. We could break Slack, or undo some new product experience that was still in flight. Over dozens of these migrations, I realized there were 5 things that really bothered leadership about migrations.

  • They’re unpredictable. Projects that we thought would take a quarter took 2 years.
  • They’re risky. They touched most of the code, so anything could break.
  • They’re expensive. We mitigated that risk by asking dozens of teams to participate.
  • They’re low-upside. The best case, mostly, is that everything is the same.
  • Value is delayed. Sometimes, you can’t ship until 100% of the work is done. Non-negotiable.

The more we shipped, the more projects behaved like migrations.

  • OSS upgrades Every time a JS library or some piece of infrastructure introduced breaking changes, we’d need to re-evaluate where it was used.
  • Architectural changes Changing the consistency semantics of memcache required all new clients to adopt the newest library, which couldn’t always be API-compatible.
  • Type safety or linters Evolving coding standards would mean marking large swaths of the codebase with TODOs if we didn’t want to swallow the work immediately.
  • Refactors Larger repos (like the monolith) benefited from well-designed internal structure, but getting there needed to respect work in progress.

What we tried first

We did a lot of these migrations in the old-fashioned way: brute force.

First, we found all the work. This normally looked like a spreadsheet with 700 things to change. We’d find all the owners, randomly assigning them if necessary. Somebody would write a migration guide.

Then, we scheduled meetings. I’d grab my favorite TPM and we’d turn those owners into team leads and managers, then those people into meetings. We’d patiently explain why we were doing this. We’d build a list of reasons it was good for them. If things were dire, we’d talk to leadership and threaten assure teams that this was higher priority than their product work.

Finally, we’d wait. We’d wait in office hours, to see if anybody had questions. We’d wait in meetings with 10 teams, to hear what kind of progress we made. We’d wait for the next quarter to see how many teams actually included the migration in their OKR. And after we’d wait for 2, or 6, or 15 quarters, we’d have 80% of the work done.

When I asked people in my shoes at other companies, they told me they did the exact same thing. We’d kvetch over coffee, but at least we felt ok about trying hard.

What we discovered

While it felt like a social problem, the underlying problem is a technical problem. Our migration tooling was weak. The team that wanted the migration done didn’t have the capacity to do it, so they (we) would make it everybody else’s problem.

What if we could simply do the work? It shouldn’t be everybody’s problem to handle a migration. If we showed up with the changes done, the tests passing, and the PRs ready to merge, we could be done in less time that it took to schedule the first roadmap discussion. Other teams would want to be aware, of course, but what better way to get up to speed than to read relevant, working, well-tested code?

Some companies have done amazing work with this approach. Slack swapped frontend testing libraries, Stripe converted to Typescript, for Google migrated from int32s to int64s, and more.

The catalyst behind those stories is frequently an interest in AI, but the real insight is that mimicking how humans work unlocks automation. Migrations are more than simply changing code; they start with exploring whether a project is even worthwhile, and they end with educating your entire organization on how they should interact with the new system. That doesn’t go away, no matter how fast you can and Copilot can edit source files.

Faster migrations means running the same playbook, but faster. There’s four steps is virtually every migration:

  • Find everywhere that needs to change (but automate it)
  • Make the changes (but automate it)
  • Test the changes (but automate it)
  • Ship the changes (with the teams that care)

This is what we all do today, but manually. The more information we can discover at every step, the better the automation works. That’s really what worked in those companies: they built powerful workflows, running and re-running the changes until they knew their changes were safe to ship.

At Tern, we want to make it work for you. We’re building this future for everybody, and we want to use this blog as a space to tell the stories of the hardest, strangest, and most interesting migrations. Subscribe to stay in the loop, or get in touch if you want to tell your own story: hello@tern.sh.

Never miss a post.