Specs are useful. They are not governance.
Spec Kit exists because vibe coding made an old software problem suddenly more dangerous. Vague intent has always produced bad software. The difference now is speed and plausibility. A loose prompt no longer produces only a rough prototype. It can produce a feature-shaped object with files, tests, interfaces, and just enough polish to make the wrong thing look like work nearly finished.
So GitHub's Spec Kit is a sensible response. It is open source. It is structured. It pushes teams toward a constitution, a spec, a plan, tasks, and then implementation. That is obviously better than throwing a paragraph at an agent and hoping it has product judgment, architectural taste, and enough local context to do the right thing.
But the licence cost was never the expensive part. The expensive part is what happens when a team starts believing that stronger upfront documents solve the delivery-trust problem. They do not. They move some risk earlier. They clarify intent. They may improve the agent's starting point. But they do not prove that the work stayed aligned with the spec, the codebase, the risks, the tests, and the facts discovered while building.
The Hidden Regression
The current industry response to vibe coding is drifting toward more work upfront. Write the constitution. Write the spec. Write the plan. Break down the tasks. Then ask the agent to execute. There is value in that sequence. Good teams already know that intent matters. A stronger spec can prevent obvious misunderstandings. A plan can force useful decisions into the open. A task list can stop the agent from thrashing.
The trouble starts when the document becomes the source of trust. Then the workflow begins to smell familiar. Not old waterfall in a suit, and not always used that way. Sensible teams can use Spec Kit iteratively, and Spec Kit itself talks about iterative enhancement and brownfield workflows. The narrower point is this: when the main answer to AI risk is "make the upfront document more complete", we should notice the direction of travel.
Agile won because plans go stale once real work starts. That was true before AI, and AI makes it more true, not less. Real software is shaped during implementation. The existing codebase pushes back. Tests reveal hidden contracts. Dependencies behave differently. User workflows have awkward edges. Reviewers notice when technically correct implementation cuts across product promises.
The source of truth should not be a document written before those unknowns appear. That does not mean the document is useless. It means the document is not sovereign.
The Contradiction In Spec-Driven AI Work
Spec-driven workflows often talk about adaptability. Good. They need to. But the centre of gravity is still upstream. The workflow asks the team to put a lot of trust into the quality of the spec, plan, and generated task list before implementation has exposed what is actually true.
That can work for greenfield work. It can work for bounded prototypes. It can work when the cost of being wrong is low. Useful software is brownfield. It lives inside an existing codebase, architecture, tests, conventions, and business promises. In that world, the agent does not only fail because the spec is unclear. It fails because it cannot reliably hold everything at once.
The spec says one thing. The codebase has a second history. The architecture has local patterns. The tests encode a third version of reality. The risks are uneven. The review standard depends on the size and surface area of the change. Then implementation starts, new facts appear, and the original plan is no longer quite right. That is not a moral failing. That is software.
The workflow either notices, adapts, and records why the path changed, or it becomes theatre. Polite theatre, with better Markdown, but theatre all the same.
Specs Are Intent, Not Proof
Software Dark Factory is not anti-spec. That would be a ridiculous position. Specs are useful. They express intent. They give the agent a target. They give the human reviewer something to compare against. They reduce ambiguity before the expensive part starts.
But specs are not governance. Governance is the evidence layer around the delivery slice. It is the record of what the agent was asked to do, what risks were identified, what acceptance criteria were chosen, what playbooks applied, what verification ran, what changed, what limits remain, and what a reviewer should inspect.
In SDF, the spec is not treated as proof that the delivery is safe. It is treated as intent. The proof lives in the governed delivery record. That record should include the things that actually make AI-assisted delivery reviewable:
Spec Kit exists because vibe coding made an old software problem suddenly more dangerous. Vague intent has always produced bad software. The difference now is speed and plausibility. A loose prompt no longer produces only a rough prototype. It can produce a feature-shaped object with files, tests, interfaces, and just enough polish to make the wrong thing look like work nearly finished.
So GitHub's Spec Kit is a sensible response. It is open source. It is structured. It pushes teams toward a constitution, a spec, a plan, tasks, and then implementation. That is obviously better than throwing a paragraph at an agent and hoping it has product judgment, architectural taste, and enough local context to do the right thing.
But the licence cost was never the expensive part. The expensive part is what happens when a team starts believing that stronger upfront documents solve the delivery-trust problem. They do not. They move some risk earlier. They clarify intent. They may improve the agent's starting point. But they do not prove that the work stayed aligned with the spec, the codebase, the risks, the tests, and the facts discovered while building.
The Hidden Regression
The current industry response to vibe coding is drifting toward more work upfront. Write the constitution. Write the spec. Write the plan. Break down the tasks. Then ask the agent to execute. There is value in that sequence. Good teams already know that intent matters. A stronger spec can prevent obvious misunderstandings. A plan can force useful decisions into the open. A task list can stop the agent from thrashing.
The trouble starts when the document becomes the source of trust. Then the workflow begins to smell familiar. Not old waterfall in a suit, and not always used that way. Sensible teams can use Spec Kit iteratively, and Spec Kit itself talks about iterative enhancement and brownfield workflows. The narrower point is this: when the main answer to AI risk is "make the upfront document more complete", we should notice the direction of travel.
Agile won because plans go stale once real work starts. That was true before AI, and AI makes it more true, not less. Real software is shaped during implementation. The existing codebase pushes back. Tests reveal hidden contracts. Dependencies behave differently. User workflows have awkward edges. Reviewers notice when technically correct implementation cuts across product promises.
The source of truth should not be a document written before those unknowns appear. That does not mean the document is useless. It means the document is not sovereign.
The Contradiction In Spec-Driven AI Work
Spec-driven workflows often talk about adaptability. Good. They need to. But the centre of gravity is still upstream. The workflow asks the team to put a lot of trust into the quality of the spec, plan, and generated task list before implementation has exposed what is actually true.
That can work for greenfield work. It can work for bounded prototypes. It can work when the cost of being wrong is low. Useful software is brownfield. It lives inside an existing codebase, architecture, tests, conventions, and business promises. In that world, the agent does not only fail because the spec is unclear. It fails because it cannot reliably hold everything at once.
The spec says one thing. The codebase has a second history. The architecture has local patterns. The tests encode a third version of reality. The risks are uneven. The review standard depends on the size and surface area of the change. Then implementation starts, new facts appear, and the original plan is no longer quite right. That is not a moral failing. That is software.
The workflow either notices, adapts, and records why the path changed, or it becomes theatre. Polite theatre, with better Markdown, but theatre all the same.
Specs Are Intent, Not Proof
Software Dark Factory is not anti-spec. That would be a ridiculous position. Specs are useful. They express intent. They give the agent a target. They give the human reviewer something to compare against. They reduce ambiguity before the expensive part starts.
But specs are not governance. Governance is the evidence layer around the delivery slice. It is the record of what the agent was asked to do, what risks were identified, what acceptance criteria were chosen, what playbooks applied, what verification ran, what changed, what limits remain, and what a reviewer should inspect.
In SDF, the spec is not treated as proof that the delivery is safe. It is treated as intent. The proof lives in the governed delivery record. That record should include the things that actually make AI-assisted delivery reviewable:
- small slices
- clear intent
- preflight
- acceptance criteria
- risk and confidence notes
- playbooks applied
- verification evidence
- model and AI usage awareness
- known limits
- reviewer-ready evidence
- explicit human gates such as `automatic_execution_permitted: false`
That line matters. SDF is an assisted, operator-reviewed, evidence-backed delivery loop. It makes AI-assisted work reviewable, auditable, and adaptive. It is not a claim that agents should mutate repos, approve their own work, merge, deploy, or guarantee correctness.
The commercial value is not "the robot did everything". That is the cheap fantasy. The value is that the work arrives with a delivery trail strong enough for a serious operator to inspect without spending half a day reconstructing what happened.
Open Source Is Not Operationally Free
Spec Kit is open source and free upfront. That is useful. It is also not the cost that matters most. Open source does not make a workflow operationally free.
The hidden cost shows up in the operating model. If the workflow creates planning drag, stale specs, duplicate work, review confusion, or false confidence, the licence being free does not save you. You pay in coordination. You pay in rework. You pay in slow review. You pay when the agent executes a document without really understanding the codebase it is touching.
The worst version of a spec-driven AI workflow is not that it fails loudly. Loud failure is annoying but honest. The worst version is that it produces an implementation that appears disciplined because the paperwork is complete. The spec exists. The plan exists. The tasks exist. The agent followed the steps. The diff is large. The team has a feeling of process maturity, but not enough evidence about whether the slice stayed aligned with reality.
That is false confidence, and false confidence is expensive. It is worse than no process because it borrows the costume of discipline while dodging the part that matters: proof.
The Delivery Loop Has To Be Safer
SDF takes the opposite bet. Do enough upfront to clarify intent, then keep the slice small enough that reality can teach you something quickly. Use fast feedback loops because they are not optional. They are how the right solution emerges. Let implementation reveal the unknowns. Capture what changed. Run the verification appropriate to the risk. Make the reviewer smarter before they open the diff.
That is more agile, not less. AI-assisted delivery needs agility more than traditional delivery because agents can produce more surface area faster. The cost of a stale plan is higher when the execution engine is fast. The cost of weak review is higher when the diff looks plausible. The cost of missing context is higher when the agent fills gaps with patterns that do not belong in your codebase.
The answer is not to abandon specs. The answer is to stop pretending the spec is the trust boundary. The trust boundary is the governed loop around the slice.
What SDF Is Really Saying
Spec Kit makes sense as a response to vibe coding. I would rather see teams write clearer intent than spray vague prompts into production code. GitHub is right that teams need more structure. My disagreement is about where the real trust should live.
Spec-driven development tries to make the upfront document stronger. Software Dark Factory tries to make the delivery loop safer. That means the loop has to include planning, but it cannot end there. It has to include execution, feedback, verification, evidence, review, and explicit limits. It has to admit that the plan will be wrong in places because useful implementation changes what you know.
Real software is not specified into existence. It is discovered, shaped, tested, reviewed, and corrected. AI does not remove the need for that loop. It makes the loop more important.