Johnny Butler

April 18, 2026

Strong governance is what gets agents disciplined enough to auto-merge and deploy

This is the part I care about most now.

Agents writing code is no longer the interesting part. That part is already here. They can move fast, write code, fix bugs, add tests, work through a codebase, and get a surprising amount done without much friction. We know that now. What matters is whether any of that can be trusted enough to carry real work all the way through.

That is the line I have been pushing toward. Not "can an agent produce code?" Can an agent work inside a system that is disciplined enough to let good work keep moving? That is a much more serious question.

A lot of the conversation still sits on model capability. Better reasoning. Better tools. More context. More parallel agents. I use all of that. I benefit from it. It matters. But it is not the main thing that gets you to auto-merge and deploy with a straight face.

What gets you there is governance. Not governance as ceremony. Not governance as management theatre. Governance as operating discipline.

I mean a workflow that keeps the agent inside real boundaries, forces the work to stay small enough to understand, and makes proof part of the job instead of something you hope appears at the end. That is what changed my view.

I do not think agentic development is some completely different thing from software engineering. I think it is software engineering with more of the discipline made explicit. The brief is clearer. The checks are closer. The boundaries matter more. The stop conditions matter more. The proof has to travel with the work. That is why it starts to feel different once it is working properly.

Good engineers already work like this. When a good engineer is moving quickly on something that matters, they do not usually try to solve the whole thing in one dramatic pass. They narrow it, break the problem down, work in slices, keep the scope under control, check what they are doing as they go, and use that feedback to decide the next move.

That is not bureaucracy. That is how you keep speed from turning into slop.

Testing is part of that, but not just because it catches regressions. It also improves the shape of the work. If something is awkward to test, that usually tells you something useful. The design is too tangled. The responsibilities are mixed together. The surface is too broad. The boundaries are not clean enough. When verification is real, the software usually gets cleaner too.

Agents need exactly the same pressure.

If you hand an agent a large task and basically say "go deal with it", that is not a serious workflow. Sometimes you will get something impressive back. Sometimes it will even work. But it will also drift. It will do too much. It will solve the wrong version of the problem. It will make a wider change than you wanted. It will keep going past the point where a good engineer would pause. It will sound more certain than it should.

That is not because agents are worthless. It is because the shape of the work is bad. Humans are not great in that shape either. The difference is that agents can do bad workflow much faster than people can. So a process that is a bit loose with a human becomes a real risk with an agent.

That is why I keep coming back to governance. If you want agentic development to hold up under real pressure, the workflow has to enforce the discipline. It cannot just gesture at it.

The slice has to be clear. The intended outcome has to be clear. The boundaries have to be named. The acceptance criteria have to be visible. The likely tradeoffs have to be visible. The verification path has to be part of the slice. The stop conditions have to be real.

Most importantly, the agent cannot just be asked to build. It has to be asked to prove.

That is the standard.

Once you start working that way, a few things become obvious quite quickly. Agents are not random. They respond hard to structure. Loose workflow makes them look flaky. Tighter workflow makes them look much more consistent.

That has been one of the clearest lessons for me after running this many jobs. I did not get closer to the result I wanted by simply trusting the models more. I got closer by trusting the workflow more.

By building a system that keeps forcing the same loop I would want from a strong engineer: reduce the problem, keep the change bounded, verify the slice, use the result to shape the next move, surface tradeoffs honestly, and stop when confidence runs out.

Once that loop is real, the output feels very different. It stops feeling like a clever demo. It starts feeling like a production system.

This is also why I think people react too quickly to the words auto-merge and deploy. Those words sound reckless if what sits underneath them is weak. And to be fair, in a lot of cases it is weak. If the scope is vague, the checks are shallow, the evidence is thin, and the workflow does not mean much, then yes, auto-merge is reckless.

But that is not because auto-merge itself is inherently unserious. It is because the system has not earned it.

What matters is whether green means anything.

Was the task small enough? Were the boundaries clear enough? Were the checks worth trusting? Was the evidence attached to the work? Were the risks surfaced honestly? Did the agent stay inside a known operating structure?

If the answer to those questions is weak, then the workflow is weak. If the answer is strong, then auto-merge and deploy stop sounding like bravado and start sounding like the natural consequence of a disciplined system.

That is how I think about it now. Merge and deploy are not the place where governance disappears. They are the place where governance pays off.

The other thing I have become more convinced of is that verification does more than catch mistakes. It changes the kind of solutions you end up with. Once a change has to be provable, you naturally start preferring smaller slices, clearer interfaces, fewer hidden assumptions, and less unnecessary cleverness.

That is good for software generally. It is especially good for agentic software delivery, because agents do better when the work itself is shaped to be inspectable.

So I do not think of governance as a layer of restrictions laid on top of the work. I think of it as one of the forces that improves the work.

That still is not the whole story, though. Getting to a trustworthy change is one problem. Operating software safely in production is another. That distinction matters more to me now than it did earlier on.

Development governance gets a change to the point where it deserves to move. Production governance decides how that change is released, deployed, smoke tested, observed, rolled back, and kept under control once it is live. Those are connected, but they are not the same.

As agents get faster, that second layer matters more. The bottleneck stops being "can we generate code quickly?" and becomes "can we operate this safely once the code arrives quickly?"

That is the harder part. It is also where a lot of the value will sit. Not in raw coding speed. In disciplined handoffs. In tighter release controls. In better smoke testing. In clearer rollback logic. In stronger operational boundaries after merge.

I do not think humans disappear from this picture at all. Their job moves up a layer. Humans still decide what matters. Humans still define the brief. Humans still decide what counts as good enough. Humans still define the boundaries, the escalation rules, the acceptable risks, and the level of proof required.

That is still engineering. If anything, it becomes more obviously engineering. Because once code generation gets cheaper, disciplined delivery becomes more valuable.

The real differentiator is no longer "can your system produce code quickly?" It is "can your system produce trustworthy changes repeatedly, with evidence, inside known boundaries, all the way through to production?"

That is the harder destination. It is also the one I think serious teams are heading toward whether they say it that way yet or not.

The teams that win here will not just be the teams with access to strong models. They will be the teams that build strong enough workflows to keep those models disciplined.

Teams that know how to break work down. Teams that know how to encode acceptance criteria properly. Teams that use verification as a design tool, not just a gate. Teams that make evidence travel with the work. Teams that know when to automate, when to stop, and when to escalate. Teams that can carry the handoff from development governance into production governance without losing control.

That is when agentic development stops being interesting in theory and starts being useful in practice. That is what I have been working toward.

And it is why my answer keeps getting simpler.

The big question is not whether agents can code. It is whether the governance is strong enough to keep them disciplined all the way to production.