Remedies for the Slop

The team in my last post escaped its production failure by turning the agents off and reading the code line by line. More than ten engineers had spent the better part of a week generating fixes. The slow method was the one that finally moved.

I use these tools every day and I’m not giving up the speed. I do want rules for the moment when speed starts producing work nobody owns. The rules change with the size of the room.

If my name is on it

One person, accountable for the thing they make, able to explain every part of it.

Start with the one that sounds obvious and almost nobody honours: own what you ship. “The agent wrote it” isn’t a defense. If your name is on the PR, you’re accountable for every line as if you’d typed it, because the model won’t be the one paged at 3am. Owning it means you can explain it to someone who asks a hard question. If you can’t, you’re not done, green checks or not.

That changes how you work with the agent. Reviewing a finished two-thousand-line diff doesn’t work: you skim, it looks right, you approve, and you’ve understood none of it. So stay in the loop while the work happens instead of after it. Pair with the agent. Say what you want, watch what it does, stop it when it drifts, and keep the change small enough that you can still hold the whole thing in your head. A hundred lines you understand beats two thousand you don’t.

And when something breaks, fight the reflex to throw another agent at it. That’s how the hole gets deeper. Read the code yourself. It’s slow, and the engineers still willing to do it are becoming the most valuable people on their teams.

The slow way: something made by hand, by someone who understands every blow.

Somebody has to look for the lie

One person being disciplined doesn’t survive contact with a team that isn’t. Thinking has to be built into how the team works, not left to individual virtue.

The thing that helps most is red teaming. Pick someone whose job, for that session, is to assume the system is already broken and find out how. It’s not a code review. It’s one standing question:

If this fails in production six months from now, what’s the most likely cause, and would we even notice?

An agent will never ask you that. It will generate the system, the tests, and a confident paragraph on why it’s fine. Whether it’s actually fine is a human call, and red teaming is where you make it.

Review against intent, not just correctness. Code that’s internally consistent and does the wrong thing sails through every automated check. Someone has to hold the real goal in their head and compare the work against it, not just confirm the tests are green.

Keep the units small and the trunk clean. Small PRs aren’t a style preference anymore, they’re what keeps a human able to review at all. Keep the main branch releasable, so a break stops the line and gets fixed before it compounds. And make room for review that isn’t tied to any single change, because most of the failures that take a system down live between PRs: the duplicated abstraction, the slow drift from what you meant to build. Line-by-line review can’t catch those.

At company scale, volume wins

At company scale, no single team is the problem. What bites you is that you can’t see across all of them at once, and the volume is now enormous.

Traceability comes first. Every change should trace back to the intent behind it, so that when something breaks, or an auditor asks, or a customer does, you can reconstruct why a thing exists and who decided it should. With humans writing every line, this was implicit. With agents producing most of it, you make it explicit or you lose the thread.

Build in layers, because no single check is enough when the output is this fluent and this voluminous. Evals that block a merge on regression. Observability required before anything ships, not bolted on after. Progressive delivery, so a change meets real traffic in shadow, then a slice of users, before it meets everyone. Each layer is cheap compared to the failure it catches.

Then the slower structural work. Concentrate the hardest thinking on your most senior people instead of spreading it thin across everyone with an agent and a keyboard; the tool rewards judgment, and judgment is unevenly distributed. Rebuild the path for juniors, who used to learn the craft by writing production code and can’t anymore, around the parts that still need a human: review, planning, design, and the discipline of understanding a system before changing it. And measure the right thing. If your dashboard celebrates output volume, you’re measuring the disease. Count the defects that escape to production, the time it takes to actually understand an incident, the share of shipped code someone can explain. Those only move the right way when thinking is happening.

The rule I keep is embarrassingly old-fashioned: if my name is on the change, I need to be able to explain it when the page arrives at 3am. The agent won’t be awake, embarrassed, or accountable. I will.

Want to chat? X / LinkedIn