Johnny Butler

March 5, 2026

From Novice to Senior: Refactoring a Whole Codebase Overnight With a Software “Dark Factory”

I planned to extract the Dark Factory engine from my personal project as a standalone tool for work.

But I wasn’t happy with the code standard — both the factory itself and some of the code it had produced. It was functional, but naïve.

So I did something different.
I used Codex plan mode to collate:
  • the software engineering practices I’ve learned over the last 10–20 years
  • the external resources that shaped how I write maintainable code
  • the “playbook” standard I’d want in production

Then I fed that playbook back into the factory.

Codex produced a detailed refactor plan for the entire codebase — but crucially, broken into independent slices.
Screenshot 2026-03-05 at 07.00.05.png
Each slice ran through the factory like a small production job:
  • implement the refactor
  • verify locally
  • verify in GitHub CI
  • create a PR with a summary, prompt, run log, and evidence
  • merge
  • move on to the next slice

When I woke up this morning, there were nearly 200 independent PRs completed.

Each one had:
  • a clear summary of what changed and why
  • the prompt/spec it worked from
  • the run log (commands + checks)
  • CI evidence it was green

And the outcome surprised me.
The code quality went from “novice” to “senior”.
Genuinely: I would hire the person who performed this refactor without hesitation.

This only works with solid test coverage.

The factory needs a contract. Without tests (and CI) it can’t validate its work before moving on — and the whole approach breaks down. You can’t safely refactor in 200 slices if you’re relying on manual UAT or gut feel.

AI makes refactoring cheaper.
But tests are what make it safe.

I have shared some samples of the PR artefacts (summary + prompt + run log + CI evidence) because that “evidence trail” is the real difference between agentic hype and something you can trust in production.