Building AI Products In The Probabilistic Era

I am spending a lot of time recently thinking about how AI systems fail and what use cases are best suited for these randomized prediction machines. In my own personal projects, I find myself reaching for AI when I want to extend, not replace, my capabilities. The prime example is Deeper Research, which is a simplified version of Deep Research that runs locally and does not summarize, but returns diverse search results for broad queries.

Even with all my thinking and tinkering, I've struggled to frame the killer use case of AI. I feel like I catch fleeting glimpses of it. I can see the edges of its strengths and weaknesses without being able to put my finger on the whole picture. This article, from someone out there building on the front lines, nails it. There are two key ideas here:

- Old software is deterministic, AI is probabilistic
- Old software is engineering, AI is science

⁠It's ontologically different. We're moving away from deterministic mechanicism, a world of perfect information and perfect knowledge, and walking into one made of emergent unknown behaviors, where instead of planning and engineering we observe and hypothesize.

The key to incredible products in the age of AI is finding out which parts of your product should be deterministic and which should be probabilistic^1.

While that's a powerful insight, I find myself on a bit of an SRE / monitoring kick of late and I found the implications for how we measure and change our systems the most interesting bit. We can no longer rely on tried and true RED and USE methods for understanding system health. Those only work for deterministic systems. Instead, we need to take a page from the book of scientific research. As best as I can tell, we need to run experiments and then conduct literature reviews to gather insights. The author laid out the "why" for this new approach here:

Knowing that users acquired through TikTok are more likely to build games, which are more expensive to generate on a per-token basis and therefore impact the margin calculus, is incredibly valuable across the entire company: from engineers making sure that games are efficiently generated, to marketers shifting their top-of-the-funnel strategy to a more sustainable channel, to the finance team appropriately segmenting their CAC and LTV analysis. A 20% shift from game-building users to professional web apps might mean the difference between sustainable unit economics and bleeding money on every free user — yet this insight only emerges from analyzing the actual content of AI interactions, not traditional funnel metrics.

The part that I am thinking about the most is their approach to gathering this type of data:

The easiest way of approaching it is by segmenting user inputs. You use smaller models to classify user requests to larger models, which allows you to segment your data in “regions of usage”. It’s a crude way of clustering user journeys. For Replit’s coding agent, this could be coding use cases: “what’s the likelihood of getting a positive message from the user after 3 chat interactions, for all users that submitted a prompt about React web apps?” To push things further, you can use the same approach to define milestones to achieve across different paths, which might mean classifying model internal states.

In some ways, this feels like the most interesting area in tech right now. I've yet to hear anyone else talk about this, even Replit's approach feels naive and unpolished. There is real opportunity to step in and build the "SRE for AI" role and best practices here because as far as I can tell, they do not yet exist. The best we have so far is: "use an LLM to group output together and make more guesses."

1 - Credit given to my coworker Ravi Starzl for the framing of "what should be probabilistic vs deterministic"

Link of the Day: Building AI Products In The Probabilistic Era

Building AI Products In The Probabilistic Era