A compiler that never says No

There’s too much code.

I don’t think that’s an opinion anymore. Even if you’re not trying to read it all, or you’ve opted out of AI for some stuff, it’s simply true. Getting lost in multi-million line codebases isn’t just a big company perk in 2026.

The standard answer is: “that’s fine, we have AI. I’ll have my agent explain it to me, on demand!”

Which invariably returns:

AuthMiddleware is the load-bearing abstraction here — it intercepts the request lifecycle and delegates to TokenValidator, which is doing the heavy lifting for session resolution. Note that there are two validators; the second one appears to be a belt-and-suspenders fallback, though it may be vestigial.
The retry logic in refreshSession() is non-trivial but principled — it implements exponential backoff with jitter, which is best-practice and worth preserving. I’d flag this as intentional rather than accidental complexity.
One thing to watch: legacyAuthShim.ts is still wired into the hot path. It’s likely safe to leave as-is, but it does introduce coupling between the edge layer and the core domain model that’s slightly orthogonal to the broader architecture.

Would you like me to add a .gitignore entry for .env.local?

Reading this stuff is agonizing. A G O N I Z I N G. Reading the [AI-generated] code underneath isn’t better. We do it, though, for enough salary, equity, or a balanced combination thereof.

The problem is that we’re delegating the decision-making to the AI and never looking at what it does.

A sufficiently sycophantic compiler

My compiler has a strong, externally defined, self-consistent view of the world. It will refuse when I’m asking for undefined behavior based on an incomplete spec. My agent doesn’t.

This is a big selling point of AI! If something is doable with a computer, it’ll figure it out. 30 minutes and 200m tokens later, it’ll take any half-baked idea and declare “the feature is fully implemented.” Agents that don’t behave like this don’t get used, because they’re irritating and/or useless.

The cost is that when there’s no structure, the agent doesn’t invent any. As Armin Ronacher observes:

Present-day models tend to produce code that is too defensive, too complex, too local in its reasoning. They avoid strong invariants. They add fallbacks instead of making bad states impossible. They duplicate code, invent bad abstractions, and paper over unclear design with more machinery.

Worse though: I so far see very little progress of this improving.

Of course it’s not improving. Setting boundaries might get in the way of the user’s desires. The good training data is where it fixed the bug, not where it argued with the user to add strong invariants to the whole system.

Summaries are a trap

Interrogating this mess is where you get those braindead AI summaries. If you can strip away the invented jargon, there’s almost always juicy tidbits in there. Those insights are a false comfort, though, because trying to understand code that doesn’t have structure is endless. There’s decisions made in the code that are simply underspecified.

If anybody complains about this happening, the answer is invariably: prompt harder. More context. Deeper organizational knowledge. Better skills. It’s not your agent’s fault for misunderstanding; it’s your fault for not communicating comprehensively enough.

But again: underspecifying the work is the point of AI development. It’s good! We have a name for the style of development where teams lock in all their decisions up front, and it’s not popular. On a personal level, simply writing the code is an excellent way to be extremely specific about behaviors.

This is inherent. Generation will always succeed, and it will be full of hidden decisions. For any change, you’ll need to verify the behavior.

Verification is the same trap, later

When writing software, the details matter.

This is also not a controversial statement, but when you combine models that sometimes make assumptions with summaries of what they’ve done, you end up needing to say “verification” a lot. “How do we think about verification?” explains Fiona Fung at Anthropic, who’s responsible for both the Claude Code and Cowork teams. She spends 90 minutes describing their expansive test suites using a tone that says “coding is solved” while admitting that Anthropic is ramping up pair programming during lunch to make sure people quit stepping on each others’ toes.

Meanwhile, the rest of us are seeing the consequences using AI on software that’s historically expected to have more than a single nine of availability. Moving 4x faster means 4x the bugs. Users hate that. Nobody gives you credit for all the new features this week, but they notice every blip in availability, so there’s a renewed call for “verification,” which mostly translates into throwing loops around agents until they run out of ideas trying to write unit tests.

The shared problem here is: you can’t black-box test your way to good software. Without some mental model of what details matter, you’ll end up testing every input with every integer but forgetting that some people have Android. The right way to do it looks more like TDD:

Understand the problem space
Limit the complexity by defining invariants
Write tests
Make them pass

Step 3 is where the summaries creep back in. The agent won’t do step 2, because that would be self-limiting. Without step 2, step 3 is infinite. When faced with too much whitespace, agents have exactly 1 trick: summarization. They make a judgement call on the most important cases, and they skip everything else.

“Let me write tests for the most important behaviors.” And so it does, with 0 context about what’s important.

PRs stopped being about decisions

The point of the code is to capture what you intended to do. It’s where you write down what is possible, and what’s impossible, with specificity. It’s where you commit to your decisions.

Those decisions aren’t clearly visible anymore.

“Too much code” is hard in the agent era not just because of the sheer volume, but because the decisions enshrined are a mix of deliberate and incidental, ranging from critical to irrelevant. That legacy codebase might be scary because it was built by people who retired in 2012, but the auth microservice that launched last week is full of decisions that no human ever saw. I know which one scares me more.

When you hand the agent your decisions, it writes them into code, as faithfully as it knows how, and it surrounds them with supporting decisions that made sense to the agent in the moment. But if your new widget is more quickly built by reverting an undocumented fix from The Big 2023 Incident, it’ll do it. It wasn’t there for the incident, and you (sensibly) didn’t bother to tell it that fix was important.

The most important place to understand intent is when making changes, and the heuristic is easy: minimize the number of surprising decisions that matter. That’s the standard to hold any PR to right now: does this contain decisions that would surprise the author? Their tech lead? The security, platform, productivity, or design system team?

You need the decisions:

Fully enumerated, without summarization,
Tied to the author: human prompt or agent assumption, and
Linked to code, the reality of what ships.

It’s the only way you spend your brainpower on decisions instead of semantically empty diffs or paragraphs of AI. Review the decisions that are now yours, even if the agent uncovered the constraint and solved the problem fully autonomously. Know what you’re shipping.

That’s what we’re building.

Then, maybe, it won’t feel so bad that there’s so much code.

A compiler that never says No

A sufficiently sycophantic compiler

Summaries are a trap

Verification is the same trap, later

PRs stopped being about decisions

Map your decisions