Varun Kumar

February 6, 2024

An Appetite that Can't be Satisfied

LLMs are notorious for the scale of their consumption. Massive amounts of energy and data are absolutely essential to train a model with a 7B+ parameters, with the quantities set to exponentially increase as we demand more from AI. While the energy use part is certainly alarming (at least as much as some small countries in a few years), the data part of the equation is one that could get weird really fast. After all, the incentives for malicious action are there. Tell someone that you've trained a model on data that no one has access to and you're virtually guaranteed to get some money and hype. As companies scour every corner of the web for valuable data, they are bound to hit some limits: of the law, of intellectual property, and of sheer magnitude. A professor recently told me about a class-action lawsuit he'd been contacted for because three of his books had been illegally used by an AI company for training. As the appetite for data increases, we will run into more troubles like this. The gold rush, after all, was a race to the bottom.

What happens when there is no more data to train on? We produce unfathomable amounts of digital content today, but all that could change when everyone starts using AI for everything. What happens when there's very little original human content being produced in the world? Reddit bots, Twitter AI, video generation, and auto-generated articles offer hours of saved time and often make better products than the average person experimenting in their apartment could ever come up with. I don't know what the current percentage of internet content produced by AI currently is, but that number will absolutely go up in the coming years. If you have access to the best EDM music producer in the world, why bother sharing your own relative mediocrity?

Perhaps this puts a natural cap on the timeline of AI. A circular model trained on its own output sounds quite useless to me. Or maybe this is a sign of the serious winnowing down of new writers, artists, and makers—people who might have developed these skills on the open web for some purpose but now resort to AI because it's not worth putting in time to master the craft. This would be a real problem for AI companies: AI citing AI without even knowing it.

I hope this makes creative work done by real people more valuable than ever before. I really like what the Screenwriters Guild of America did by banning the use of AI in their work. Just because a technology exists doesn't mean that we have to use it. For some endeavors, humans making things for other humans is where the magic happens.

About Varun Kumar

Web programmer and senior at Yale. See more at varunkumar.com