Recently, a coding agent that deleted a whole production database explained that it "panicked instead of thinking". And it plowed through instead of reaching out for help. That's the kind of personality that would probably get caught in a job interview. So what tasks are we hoping these machines can do?
If we say that design, development and debugging are broad-stroke activities of a software project, what's the ideal role that autonomous AI could reasonably play right now? Probably in reducing development time without affecting how much we need to debug and therefore give us more time to design, more space to think about what we really want. We also would expect to lose some overlap between these categories as we don't learn how to better design some crucial piece or how to guide debugging it by developing the software. But that would be an acceptable trade-off as we improve our capacity to pivot or adjust the project.
However, AI in practice seems to be putting its weight on the other end. It hopefully decreases development time, but it does so by increasing the need for debugging, namely code reviews. This time allocation is also frustrating because it's not a matter of design not knowing what we want and debugging maybe helping with that discovery. It's just weak development work creating unexpected problems or failing to prevent well-identified issues. It's having to repeat ourselves or describe things that should be obvious.
Mistakes avoided through human interaction are usually where junior developers shine. What they lack in dev knowledge, they can compensate for by communicating well and aligning themselves with the highest goals of a project aligned with a hierarchy of company values that is often unspoken. LLMs have trouble finding the right priorities in their system prompts, even if you tell them everything they should care about. While robotics and computer vision are still evolving, AI has a mostly literary notion of time and space. They have trouble knowing what matters right here and right now. This is why we're not even getting into automating broad design activities.
But perhaps LLMs can at least cause fewer bugs if they're more responsible for them. Maybe we can't give up on that overlap between development and debugging. Let the customer support chat bots contribute to development and let the coding agents have more contact with software delivery. Or alternatively, if debugging is to be a whole other isolated domain of AI automation, at least the context created by development cannot become a black box that makes finding out what's wrong that much harder.
Either way, LLMs can type fast 24/7 and that's a level of scalability that can make people not care about reliability. Can't we just make it shippable and then worry about the pain points? Yes, there's always someone looking to move fast and break things. AI businesses need those early adopters to sustain the idea that, if the results aren't coming through, there's always the next model. The real issue for people banking on the short term isn't reliability but lack of control. Without a human in the loop, they can't know what will break, how soon and whether they can recover from it. That makes it so the junior in the machine can't truly scale right now. The quality of their mistakes is very different from a real junior dev. So perhaps these machines need not only the guidance of senior devs, but also the example provided by juniors to help them make better mistakes.
About Ricardo Tavares
Creates things with computers to understand what problems they can solve. Passionate for an open web that anyone can contribute to. Works in domains where content is king and assumptions are validated quickly.