Taming the AI Dragon: Power and the Illusion of Precision

A few days ago, tech analyst Benedict Evans has published “The Deep Research Problem”, reviewing OpenAI’s “Deep Research.” On paper, this new tool seems a dream come true for anyone who spends much of their time gathering, compiling, and analyzing data. Yet Evans highlights several pitfalls that reveal the true state of Generative AI today.

An Innovation Held Back by Data Errors

In his post, Benedict Evans explains how Deep Research looks “tailor-made” for analysts: you specify a topic, and the AI promptly collates the relevant data, arranges it, and produces a report. But soon enough, one cannot help to notice a few reliability issues:

Deep Research serves up smartphone market share figures for Japan that do not match those from Statcounter, Kantar Worldpanel, or even a local regulatory body.
As a result, to ensure validity, the user must verify each number—defeating the entire purpose of time-saving automation.

Evans reminds us that Large Language Models (LLMs) are not traditional databases and shouldn’t be tested as if every query must yield a perfectly accurate and verifiable result. There’s a contradiction at work: we want a deterministic answer (precise figures), but we’re relying on a probabilistic mechanism (the LLM), which isn’t built for guaranteed correctness in every scenario.

Not Yet Meeting Its Own Ambitions

OpenAI’s own showcase, designed to prove Deep Research’s value, falls into the very trap it was meant to address. Open AI Deep Research produces a neat-looking table yet stumbles on critical points. Evans concludes that unless these tools reach an error rate of zero (which is hotly debated in terms of feasibility), we cannot fully trust them to perform data-driven research or strategic analysis autonomously.

He contrasts LLMs’ ability to “guess what you mean” (where they shine) with their struggles to “reliably retrieve precise facts” (a task akin to database queries). Evans notes that OpenAI promises an AI that can do both, but the current reality is less than ideal.

Where Does GenAI Fit Right Now?

From Evans’s perspective, these data-related mistakes show that AI isn’t ready to serve as the central, fully reliable engine of a research tool. He concludes that, as of today, GenAI foundation-models:

Lack a genuine “moat” other than their deep pockets.
Haven’t found a universal, foolproof product for everyday users.
Remain most relevant in three primary areas:
- Software development, where the model does assist in code generation.
- Marketing, where “mostly correct” text is often sufficient for ad copy or concept creation.
- Larger applications where models are "hidden" behind an API so that business logic can validate or filter the responses. This approach, via API integrations, positions AI as a component of a system that maintains overall reliability.

Conclusion: A Dragon Still Needing Its Rider

Just like the dragons from House of the Dragon—capable of immense power but potentially perilous when left unchecked—GenAI models show formidable potential alongside significant pitfalls. In Benedict Evans’s view, today’s AI works beautifully to generate code, produce a first marketing draft, or act as an embedded component in a broader software ecosystem. It is not yet suited to independently helm data research or reliable fact-checking tasks.

This is very similar to how we use our own platform, Ninja, at Jolt Capital. Ninja, like GenAI, is a powerful tool designed to streamline the investment research process. However, much like the AI tools discussed here, it isn’t meant to replace human expertise. Instead, it serves as a critical component in a larger system that helps Jolt’s teams identify investment opportunities faster, analyze data, and focus on the strategic aspects of decision-making. While Ninja is effective in processing vast amounts of data and providing insights quickly, it’s the human judgment that ultimately ensures the final decision is both reliable and informed.

Rather than expecting Ninja (or GenAI) to be a perfect standalone solution, Jolt Capital has integrated it as a key tool that supports and enhances the expertise of its teams. In the same way, GenAI is most useful when it’s incorporated as part of a more comprehensive system that allows humans to step in, validate, and refine the outputs. For now, whether using Ninja or GenAI, these tools are catalysts, not replacements, helping us move faster and smarter while still requiring human oversight.

- PhL