Ask Claude to build you a dashboard and it'll give you a dashboard. Cards, charts, navigation, the works. It'll be clean. It'll be functional. It'll look like every other dashboard you've ever seen.
They'll hesitate at the navigation. They'll miss the button you thought was obvious. They'll try to do something in two steps that should take one. They'll get an error and not know why. They'll close the tab and go back to their spreadsheet.
The interface AI generated was a collection of correct components. What it wasn't was an experience, a sequence of moments designed to make a human feel competent, oriented, and in control. That's the layer AI can produce the ingredients for but can't compose on its own. And it's where the actual product lives.
I see this constantly. Building Sitebook, a project management tool for builders, the AI put a directory of all projects on the same page as the panels to modify a specific project. Functionally it worked. Every element was correct. But a builder opening that screen wouldn't have a clue what the page was actually for, or what they were supposed to do next. That understanding, that it was wrong and needed to change, came from me, before it ever hit a customer.
• • •
OpenAI recently shipped Codex, an agent that can test your application flows autonomously. It runs through your app, clicks the buttons, fills in the forms, checks that the right things happen. As a QA tool it's genuinely useful, it catches functional regressions, broken links, failed API calls. The stuff that used to eat hours of manual testing.
But here's what Codex can't catch: the layout that's technically correct but visually broken. The spacing between elements that makes a form feel cramped. The font size that's readable on a MacBook but miserable on a 13-inch Windows laptop. The flow that works, every step passes, no errors, but feels confusing because the visual hierarchy is off and the user can't tell what to do next.
Functional correctness and experiential quality are two different things. AI testing is getting very good at the first. It has no mechanism for the second. It can tell you that a button exists and responds to clicks. It can't tell you that the button feels like it belongs there.
Rick Rubin would call this the difference between the technical and the felt. In The Creative Act, he describes how a song can be perfectly played, every note in tune, every rhythm precise, and still feel dead. The thing that makes it alive is something the musician brings beyond technical execution: feel, presence, intention. Products work the same way. You can pass every test and still feel wrong.
• • •
Three specific UX layers that AI-generated products consistently miss:
- Transitions and flow. Most AI-built apps treat each screen as an isolated page. Click a button, get a new page. But real UX is about the seams, how you move from your inbox to a specific message, how you return to where you were, how the interface signals that something changed. Get these wrong and users feel lost even when they can see exactly where they are.
- Error and edge states. The happy path is easy. AI nails the happy path every time. But products live in their edge cases. What happens when the network drops? When a field is filled in wrong? When there's nothing to show yet? These states are where trust is built or broken.
- Progressive disclosure. When AI makes features cheap to build, the instinct is to put everything on screen at once. Great UX does the opposite, it reveals complexity gradually, showing people what they need now and letting them pull more when they're ready.
• • •
So how do you catch what automated testing can't?
The first answer is unglamorous: use the thing yourself. Not once, not as a demo walkthrough, but repeatedly, in the conditions your users will actually face. On a bad internet connection. On a small screen. When you're tired and impatient. The bugs that matter most are the ones you feel before you can name them, the moment where something takes one click too many, where a label makes you pause, where the flow loses you between steps.
The second: put it in front of someone who isn't you. Watch them without guiding them. The instinct to say "no, click over there" is the instinct you need to suppress, that impulse is data. It's telling you the interface failed to communicate what you thought it communicated.
The third: get nitty gritty with the details AI generated. Check the spacing by eye, not by spec. Read every string of copy out loud. Resize the window and see what breaks. This is the work that loops and agents can't do for you, because they don't know what "feels right" means. Only you do, and only if you've trained yourself to notice.
The production layer collapsed to near-zero, which means the UX layer is now the majority of the work. Use AI to generate the screens. Use a human to design the experience between them. That's where the product lives.
~ Kosta