
Chamath Palihapitiya is a controversial figure, but his recent breakdown of how large language models work, and their limitations is worth checking out. He makes a few great points:
First, access to proprietary and constantly evolving data is important. While he doesn't mention the Open AI lawsuits, he implies that owning your own training data is an advantage.
Second, an exponential increase in training data amount only results in a linear performance improvement, and context or freshness of data is also important. Data from Common Crawl might be useful for training but could quickly grow stale.
All this seems like a sales pitch for xAI's Grok but it certainly piqued my interest. $16/mo for early access might be enough for me to choose to pay for it. But if X's AI isn't exponentially better than freer alternatives, I doubt I would stick with it.