When AI generates code, it can look convincing even when it’s wrong.
So I don’t assess the implementation first. I run the specs first.
In traditional development, you wouldn’t submit (or even entertain) a change that fails CI. Why would you treat AI-generated code differently?
If tests are failing, I ask the model to fix them until everything is green. And interestingly, in the process of fixing specs, it often produces a smaller, more elegant solution than the first draft.
If you’re in a big monolith and running the full suite is heavy, get the AI to help you run the right slice locally: it can point you to the affected specs, or give you the exact command to run just the relevant files.
Key rule: tests are the contract. No green, no opinion.
So I don’t assess the implementation first. I run the specs first.
In traditional development, you wouldn’t submit (or even entertain) a change that fails CI. Why would you treat AI-generated code differently?
If tests are failing, I ask the model to fix them until everything is green. And interestingly, in the process of fixing specs, it often produces a smaller, more elegant solution than the first draft.
If you’re in a big monolith and running the full suite is heavy, get the AI to help you run the right slice locally: it can point you to the affected specs, or give you the exact command to run just the relevant files.
Key rule: tests are the contract. No green, no opinion.