From Novice to Senior: Refactoring a Whole Codebase Overnight With a Software “Dark Factory”

I planned to extract the Dark Factory engine from my personal project as a standalone tool for work.

But I wasn’t happy with the code standard — both the factory itself and some of the code it had produced. It was functional, but naïve.

So I did something different.

I used Codex plan mode to collate:

the software engineering practices I’ve learned over the last 10–20 years
the external resources that shaped how I write maintainable code
the “playbook” standard I’d want in production

Then I fed that playbook back into the factory.

Codex produced a detailed refactor plan for the entire codebase — but crucially, broken into independent slices.

Each slice ran through the factory like a small production job:

implement the refactor
verify locally
verify in GitHub CI
create a PR with a summary, prompt, run log, and evidence
merge
move on to the next slice

When I woke up this morning, there were nearly 200 independent PRs completed.

Each one had:

a clear summary of what changed and why
the prompt/spec it worked from
the run log (commands + checks)
CI evidence it was green

And the outcome surprised me.

The code quality went from “novice” to “senior”.

Genuinely: I would hire the person who performed this refactor without hesitation.

This only works with solid test coverage.

The factory needs a contract. Without tests (and CI) it can’t validate its work before moving on — and the whole approach breaks down. You can’t safely refactor in 200 slices if you’re relying on manual UAT or gut feel.

AI makes refactoring cheaper.

But tests are what make it safe.

I have shared some samples of the PR artefacts (summary + prompt + run log + CI evidence) because that “evidence trail” is the real difference between agentic hype and something you can trust in production.