The iOS Developer Who Picked Nomad Over Kubernetes for Yext's 2,000 Services

These are the highlights from an episode of Tern Stories. You can watch the full conversation with Tom Elliott on YouTube, Spotify, Apple, or wherever you get your podcasts.

Tom Elliott joined Yext in 2015 as an iOS developer. Within a year, both his mobile projects were canceled.

But Tom’s career crisis was nothing compared to what was happening with Yext’s infrastructure. The company was hitting the limits of homegrown tools built by ex-Google engineers who had recreated their own version of Borg called Khan. It worked when Yext was small. Now they were running thousands of microservices—Tom calculated they had a higher microservice-to-employee ratio than Uber.

The real breaking point? Server upgrades. Every time the infrastructure team needed to upgrade a machine, they had to manually edit code to move services off, do the upgrade, then edit code again to move everything back.

“Everything was manually configured as to where things lay,” Tom recalls. Khan had been built by “one or two people” as a coalition of the willing. Now Yext had grown past 20 teams, and adding the features they desperately needed—like automatic workload migration—would require massive investment in a homegrown system.

In 2017, the CTO made a decision: they would migrate to HashiCorp’s Nomad. Tom, who had pivoted to backend development and built a reputation for creating tools developers actually used, suddenly found himself on the task force responsible for migrating 2,000 services.

The Top-Down Migration That Almost Broke the Team

The CTO’s choice of Nomad was controversial. Yext had already tried Kubernetes twice—once in their data center where nodes went down and services wouldn’t move correctly, and again in cloud regions where they tried to reinvent everything at once. Both attempts failed.

His reasoning for Nomad was simple: Containerizing 2,000 services would take years. Nomad could run bare processes. They could migrate without containerizing.

But the migration itself was brutal. The CTO mandated a task force approach with detailed upfront planning. They created a rigid schedule stretching months into the future, with teams assigned specific weeks to loan engineers. There was no slack built in—if one team fell behind, the entire schedule would cascade.

With only 5% spare capacity in the data center, they had to migrate machine by machine. But the real constraint wasn’t the servers—it was the approach. Everything was planned upfront, assuming perfect execution.

“If we fell behind, there was no way to recover,” Tom explains. The first few weeks, they only hit 70% of their goals. Missing packages like ImageMagick—discovered during production cutovers—would halt everything while they begged the infrastructure team for emergency installs.

The schedule constantly slipped. Teams fought over resources. Engineers burned out. They finished a month late with what Tom describes as “ebbing and flowing” morale.

“It was very much the pace and the nature of the work that made it drain people,” he says.

The Dry-Run Revolution

A VP noticed Tom’s pattern of building tools developers actually used. “It seems like you like building tools,” he said. “Want to run that group? We’ll give you three teams.”

Two years later, CentOS went end-of-life. Yext faced an even bigger migration: containerizing everything or failing security audits. This time, Tom was running the show.

Instead of another death march, Tom built tools that changed everything. The key innovation? Dry-run scripts that could test every service before migration.

“You could run the script in dry-run mode for every single job that a team had,” Tom explains. “It would say: there’s a 99% chance these 90 jobs will just work. These 10 have a problem.”

No more discovering missing ImageMagick during production cutovers. No more centralized task force. No more fighting over schedules.

Tom created a public leaderboard showing each team’s migration progress. It became a race. Teams scheduled their own work between projects. Nobody wanted to be last.

The containerization migration—bigger and more complex than Nomad—finished a month early.

Stop Planning, Start Tooling

Looking back, Tom’s advice surprises those expecting a planning sermon: “Your plan will change. Be okay with that.”

The Nomad migration had perfect plans, dedicated resources, and executive mandates. It nearly broke the team. The containerization migration had better tools, shorter feedback loops, and distributed ownership. It finished early with teams competing to migrate faster.

“I would much prefer people did a ton of prototyping with dry runs,” Tom says. “Gather information about stuff and have that at their disposal. Then have a rough timeline and use that information to figure out the best way forward in the moment.”

The iOS developer who started by building a bash script to solve his own problem had discovered something fundamental: in large-scale migrations, tooling beats planning every time.

You can find Tom on Bluesky as @telliott.me and as the founder of Ocuroot, building tools to manage CI/CD in complex environments.

The iOS Developer Who Picked Nomad Over Kubernetes for Yext's 2,000 Services

The Top-Down Migration That Almost Broke the Team

The Dry-Run Revolution

Stop Planning, Start Tooling

Watch the Full Episode

Never miss a post.