My AI-Supported Development Workflow

Inspired by a16z's exploration of the AI software development stack, I've built an approach that uses multiple AI models, each for what it does best. This is how I work through projects from start to finish.

The Planning Phase

I begin every project by brainstorming with GPT-5 High, which excels at big-picture thinking and structured planning. I ask it to create a comprehensive implementation plan with a detailed todo list, complete with checkmarks, saved as a markdown file in the project root. This plan is purely for tracking the implementation process: what needs to be built and in what order.

An important part of the plan includes creating specification files. These aren't the same as the plan itself. The plan details that specification files need to be created, and those files will document the actual behaviour, architecture, and requirements of the system.

I iterate with GPT-5 High until I'm satisfied with the scope and structure. By iteration, I mean I read through the plan, make corrections where needed, and ask questions to see if it can be improved. This review process ensures I've thought through edge cases and dependencies before writing a single line of code.

Implementation with Claude

Once the plan is solid, I switch to Claude Sonnet 4.5 for implementation. The todo list is ordered by impact and effort, ensuring I tackle work in the most logical sequence. I work through it item by item, where each task represents a small piece of work that can be completed and tested in isolation.

For each task, Claude delivers three things:

The specification file(s) documenting what the feature does, its expected behaviour, edge cases, and architectural decisions
The implementation code that realizes the specification
Automated tests that verify the implementation matches the specification

As the code evolves, so does its documentation. The specification files aren't afterthoughts but deliverables created alongside the code. Claude only marks a task as complete when all three elements are in place and the tests pass. This keeps me focused and gives me confidence that each completed feature is properly documented and working.

The Verification Loop

After each task, I turn to GPT-5 Codex Medium for verification. It acts as a fresh pair of eyes, reviewing all three deliverables:

Does the specification file clearly document the feature's behaviour and decisions?
Does the implementation actually fulfil what the specification describes?
Is the implementation complete with all the necessary components in place?
Do the tests adequately cover the specified behaviour?

If Codex spots issues (whether in documentation clarity, missing implementation pieces, implementation correctness, or test coverage), I feed that feedback back to Claude Sonnet 4.5 for refinements. This might mean updating the specification for clarity, completing missing parts, fixing the implementation, or improving the tests.

This creates a feedback loop where the specification, implementation, and tests all stay aligned and complete before moving forwards.

A practical example: I once had Claude Sonnet 4.5 mark a task as complete with confidence. The code was written, tests passed, and the specification was thorough. But when GPT-5 Codex Medium reviewed it, it quickly spotted that whilst everything was technically done, the new code wasn't wired up yet and wasn't actually in use. Codex then generated a prompt that I fed back to Claude, which correctly explained how to wire the newly generated code into the main workflow. Without this verification step, I would have moved on thinking the feature was complete when it wasn't actually functional.

Final Quality Check

When Claude has worked through the entire implementation plan, I do one final pass with GPT-5 Codex Medium. This comprehensive review ensures all requirements are properly met, specification files accurately document the system, test coverage is adequate, and everything aligns.

At this point, I have working code, specification files that explain what was built and why, tested implementations that match those specifications, and multiple layers of verification. The plan in the root has served its purpose as an implementation tracker, whilst the specification files provide lasting documentation of the system.

Model Selection

I choose models based on what they're good at. GPT-5 High excels at complex, big tasks like initial planning and brainstorming. It can reason through architecture, dependencies, and project structure at a high level, making it the right choice for comprehensive implementation plans.

Claude Sonnet 4.5 is the best at implementing code. It translates specifications into clean, working implementations, generates thorough tests, and maintains consistency across all deliverables.

GPT-5 Codex Medium is better for faster verifications of smaller tasks. GPT-5 Codex High takes longer to think more deeply about problems, but this extended reasoning isn't needed when verifying and reviewing a single task. A faster model works better for the tight feedback loop during iterative development.

Optimising the Workflow

The tools work better when configured for this specific workflow. I maintain configuration files that finetune each model's behaviour:

CLAUDE.md for Claude Sonnet 4.5 - defines how it should approach implementation, testing, and documentation
AGENTS.md for GPT-5 Codex - configures verification patterns and review focus areas

You can see examples of these configuration files here:

These configuration files ensure each model understands its role in the workflow and follows consistent patterns across projects.

Why This Works

Each AI does what it's good at:

GPT-5 High: Strategic planning and implementation roadmaps
Claude Sonnet 4.5: Creating specification files, detailed implementation, test generation, and marking off completed tasks
GPT-5 Codex Medium: Verification across specifications, code, and tests

The result is cleaner code, comprehensive test coverage, proper documentation through specification files, and a collaborative development process. By treating AI models as specialist team members rather than all-purpose tools, I get quality results whilst keeping development efficient.

The separation of concerns matters. The plan tracks implementation progress, the specification files document what was built, the tests verify correctness, and multiple verification layers catch issues before they become problems. It scales from small features to complex projects.

This workflow was inspired by a16z's article on the trillion-dollar AI software development stack, which provides an excellent overview of the emerging AI coding ecosystem.