Johnny Butler

February 14, 2026

PR Review Is Changing in the AI Era

PR review was never really about the code.

Yes, we look at the diff. But the real thing we’re trying to assess is judgment:
  • did the engineer understand the problem?
  • did they make sensible trade-offs?
  • did they spot the risks?
  • did they validate behaviour?
  • did they leave the codebase better than they found it?

The uncomfortable truth is: the final code artifact has never been the deciding factor on its own. Two engineers can ship similar code and one can still be far riskier than the other, based on how they got there.

AI makes that even more true.
Because AI lowers the cost of producing plausible-looking code.
The diff still matters. But it’s becoming less diagnostic of the engineer.

So what changes?
The diff becomes the output. The process becomes the evidence.
In an AI-assisted workflow, the highest signal often isn’t “what code did you end up with?” It’s:
  • what did you ask the model to do?
  • what did you accept vs reject?
  • how did you react when it suggested something wrong?
  • what risks did you surface proactively?
  • what tests did you run, and when?
  • did you iterate until green, or hand-wave around failures?
  • did you consider rollback, observability, performance, security?

That transcript is the new evidence of engineering judgment.
What PR review might start to include
I don’t think teams need to attach full chat logs to every PR. That would be noisy and performative.

But I do think it becomes valuable to capture a lightweight version of “how we got here”, especially for non-trivial changes.

For example:
  1. Intent and constraints
  2. A short paragraph: what problem are we solving, what constraints matter, what did we explicitly choose not to do?
  3. Key trade-offs
  4. What options were considered and why was this approach chosen?
  5. Risk surface
  6. What could go wrong? What did we do to mitigate it? What’s the rollback plan?
  7. Validation
  8. Which tests prove this works? What did we run locally? What did we verify in staging? What metrics/logs would catch regressions?

None of that is new. Great engineers have always done it. The difference is: AI makes it easier to generate the code, so the documentation of judgment becomes more important.
What reviewers should look for now

If you want a simple checklist for AI-era PR review, it might look like:
  • does the PR clearly state intent and scope?
  • are trade-offs explicit?
  • are failure modes considered?
  • is there evidence of verification (tests, screenshots, staging notes)?
  • did the author reduce complexity, or just add more code?
  • are the risky parts isolated and easy to revert?

If the only evidence is “the diff looks fine”, you’re relying on the least reliable signal.
If the evidence is “here’s what we were trying to achieve, here are the trade-offs, here’s how we verified it, and here are the risks”, you can move fast without gambling.

AI doesn’t remove the need for judgment. It makes judgment more valuable.
In an AI world, the diff is the output.
The transcript — the prompts, decisions, trade-offs, and validation steps — is the evidence.