Studying for the Turing Test
The Turing Test, conceptualized by renowned computer scientist Alan Turing in 1950, is a method proposed to evaluate a computer’s ability to exhibit intelligent behavior. The test involves three parties: a "judge" human, a "control" human, and a computer. The judge sits in front of two doors. Behind one door is the control, behind the other door is the computer. The judge does not know which other party is behind which door. The judge has conversations with each party, by passing slips of paper under each door (to the control and the computer) and receiving slips of paper back with the responses. Based on the responses, the judge tries to determine which door has the computer behind it, and which door has the control. If the judge is able to guess successfully, then the computer has failed the Turing Test. Otherwise, the computer is said to have exhibited intelligence behavior, and to have passed the Turing Test. (The computer must do this "reliably", that is: a sufficient number of times with the same result).
In Turing’s formulation, an aide was receiving the slips of paper under the door for the computer, inputting them into the 1950s-sized computer, waiting for the output, and passing the output back to the judge. No such aide would be needed today; we also no longer need the doors, or the slips of paper. In the last twelve months, ChatGPT has made us all familiar with the concept of conversing with a computer over text in this general way, and the Turing Test has gone from a theory designed for a lab setting to a practical and straightforward activity. Bring up two browser tabs, with a Large Language Model text generator (such as ChatGPT, LLM for short) on one and an instant messaging service (Rest In Peace AOL Instant Messenger) on the other. Give a judge access to the other side of both chat boxes, and voila - The Turing Test in your own home. Friends have told me that they’ve found online products that set this up for them, allowing them to act as the judge in a real Turing Test online! ChatGPT seems so tailor-fit to work well in the Turing Test (which is well-known in the field of computer science) that it makes me wonder to what degree this "intelligence" test has shaped the path researchers took when building a computer that they wanted to claim was intelligent. (If Alan Turing had claimed in 1950 that cooking was the hallmark of intelligence, would our technological breakthroughs 75 years later have been not chat bots but mobile robots that were terrible conversationalists but excellent chefs?)
The Turing Test is most often formulated as a test of computer intelligence, but the abilities of the other participants are also in play. If the judge is not successfully able to determine which door has a computer behind it and which door has a human, the computer has passed the Turing Test, but the judge has failed the test of telling computers and humans apart. Jaron Lanier points this out in his book You Are Not A Gadget to illustrate that computer ability can only be determined in a relative sense, in the eye of the human beholder: "You can't actually tell if a machine has gotten smarter or if you've just lowered your own standards of intelligence to such a degree that the machine seems smart. If you can have a conversation with a simulated person presented by an AI program, can you tell how far you've let your sense of personhood degrade in order to make the illusion work for you?"
What are the ways in which I’ve let my own sense of personhood degrade in order to make the illusion work for me? Within this realm of computer-human, text-based interaction, "shallow reading" is the term that comes to mind. I’ll re-share a quote from the The Stavanger Declaration Concerning the Future of Reading that I shared in my previous post late last year: "Readers are more likely to be overconfident about their comprehension abilities when reading digitally than when reading print […] leading to more skimming and less concentration on reading matter." Since chatGPT has become generally available, my bad habit of "skimming" articles comes back to bite me in the times it takes me several hundred words of "reading" to realize that I’m reading text made by an LLM text generator. With LLM generated text now out in the wild, it could be said that we act as Turing Test judge every time we click on any random internet article. (My last-ditch heuristic: go to the "About Us" page of the website. If the website is just a host for algorithmically-generated text, the About Us page will be either figurative or literal gibberish.)
There’s a third party to be tested in the Turing Test: the "control" human. If the judge is not successfully able to determine which door has a computer behind it and which door has a human, the computer has passed the Turing Test, but the control human has failed the test of having a more notable personality than a computer. I’m asking that word, "notable", to do a lot of work. A "notable" personality could be anything: signature humor style, new insight, unique formulation of ideas. To fail this test as a control means one has none of those elements present in one’s writing. (Wouldn’t right now be a hilarious time to reveal this article was produced by an LLM text generator?) LLMs generate text based on the word probability distributions of a given language’s grammar rules and existing writing, calculating lots of word relationships across lots of documents and mushing them all together. LLM generated-text should be, theoretically, the "average" text of that language - but it’s also far worse than the writing of any (earnest) individual writer, because it will be missing the unique voice, perspective and formulation that makes anyone’s writing good (or at least notable). When we read something from an author, we’re not just looking for informational text - we’re looking for that author’s informed perspective. (I’d also argue that since LLMs calculate on patterns of text, not actually knowledge, even as informational text it’s fundamentally unreliable.)
—
The Algorithmic Endgame
Through this lens, one can view general availability and seeming cultural ubiquity of LLM text generation tools like ChatGPT to be a downward inflection point for the flourishing of thoughtful human exchange on the web. Our news feed will soon be filled with junk, we can say, and it will no longer be possible to determine the source, motive or veracity of the algorithmically-generated media presented to us. But this state of affairs is nothing new: it is in fact where we’ve been for the last decade, if not longer.
The difference between a feed filled with algorithmically selected writing and a feed filled with algorithmically generated text is the same difference between telling a joke and describing what is funny about that joke. Social media companies and "machine-curated news outlets" have already been feeding us writing that, for all intents and purposes, was generated by their algorithm. The performance metrics of content generating websites remain the same: Daily Active Users and Time On Site have been the name of the game for the last 10 years, and remain the name of the game today. If before, we were shown writing which Facebook tagged as relevant to (for example) white 20-something technologists who are suspicious of non-local authority structures, now Facebook can ask ChatGPT "Generate some text relevant to white 20-something technologists who are suspicious of non-local authority structures."
Now: I by no means wish to downplay the potency of this new wave of LLM text generation tools. The difference in the last sentence of the preceding paragraph is meant to show both the similarity and the difference between a pre- and post-LLM text generation world. The major difference is that, whereas Facebook has a complete but fundamentally limited library of existing text to try to capture our attention, LLM text generators have the power to build up bespoke engagement prisons for each of us individually, brick-by-brick and word-by-word. And keep in mind that in this parlance, text "relevant to" a demographic sliver is unlikely to mean human interest stories and innocuous self-help; it’s much more likely to be inflammatory (and increasingly: false) political or social writing designed to anger or scare us, so that we spend money on certain things or vote a certain way. LLM text generators are not just trained on all of the great literary classics; they’re also trained on all the Buzzfeed, Barstool and Breitbart articles that have been distracting us from finishing any great literary classics.
I nearly catch myself in a contradiction, here, arguing above that a) human writing is fundamentally more informing than computer-generated text but also that b) computer-generated text is merely an evolution of the clickbait writing we’ve become used to. The clarifying nuance is: our writing became tainted with computer-generation as soon as we started producing things with the aim of algorithmic reach. This is generalizable to all creative output hosted on the social web: alongside short-form (Twitter) and long-form ("news feed") writing, any gymnastics done to our photos, videos or music purely for the sake of algorithm amplification is interpretable as algorithmic input into our work.
—
Personality Filters
These gymnastics are the large scale, real-life way in which we fail in our roles as human controls and human judges in the Turing Test. As controls, we lower our standards of output to the level of algorithmic interest; and as judges we allow ourselves to be satisfied with this level of degraded human work.
In their book Affluenza, authors John de Graaf, David Wann, and Thomas H. Naylor quote the economist Ernest van den Haag: "The benefits of mass production are reaped only by matching de-individualizing work with equally de-individualizing consumption. […] In the end, the production of standardized things by persons also depends the production of standardized persons". Van den Haag wrote this in 1963, speaking about the mass-production of consumer goods. The relationship to our discussion is simple, though: the "production of standardized persons" required both for the production and consumption of standardized media speaks to our failures in holding high standards for our 2/3rds of the Turing Test.
Some examples of algorithmically smoothed output include:
- Echoing an opinion on a hot button issue on Twitter or Facebook, when in truth we didn’t care passionately about the issue before Twitter or Facebook fed us that strong opinion.
- Writing news articles that summarize other news articles, with no new information but with an over-inflammatory or purposefully misleading interpretation, for greater engagement.
- Anything written in "LinkedIn voice" ("I’m honored and humbled to share…") or "Instagram Voice" ("I did the thing!")
Some examples of algorithmically smoothed consumption include:
- Believing something we see from multiple unknown sources without fact-checking it with one of our trusted sources
- Most Spotify playlists
- Any (social or non) media app’s For You Page
The Affluenza authors add to van den Haag’s analysis: "De-individualization […] cannot help but strip life of both meaning and inherent interest. The worker-consumer is vaguely dissatisfied, restless, and bored, and these feelings are reinforced by advertising, which deliberately attempts to exploit them by offering new products as a way out. […] The products and media distract us from the soul’s cry for truly meaningful activities."
De Graaf, Wann, and Naylor’s notion and description of the "worker-consumer" should feel familiar to anyone who has spent a serious chunk of time posting (working) and scrolling (consuming) on a social media app. The pattern of "vaguely dissatisfied, restless and bored behavior" looks like: 1. scrolling through the app, 2. closing the app out of boredom or exasperation, 3. staring off into middle distance, still bored or exasperated, 4. opening a different (or the same) social media app to scroll some more, with the cycle perhaps being broken by 5. finding something exciting to buy.
The products and media distract us from the soul’s cry for truly meaningful activities.
All this is to say: computer generated text is different from algorithmically sorted writing in degree, but not in kind. And, to take the long view: the mass-produced online "content" mill of the 21st century is not so different from the mass-produced consumer goods mill that rose to dominance in the mid-20th.
—
The End of Content
Contrary, perhaps, to what my tone above might suggest: LLM-generated text has me more hopeful about the state of human discourse than I was before its introduction. As we move forward, the presence of LLM-generated text will make more and more obvious that the game we’ve been playing for upwards of a decade (online attention capture through algorithm for the sake of manipulating and selling you things) no longer maps at all onto the original pretense of "social media" (a place to connect with friends and stay informed about the real world).
As the game becomes more and more apparent, it becomes increasingly likely that anyone who becomes trapped in a constructed prison of algorithmic media (as we all do from time to time) will be able to more readily identify the shadows projected onto the wall, unchain themselves, and walk back into the light. I daydream of an internet wasteland, where Facebook, Twitter, TikTok, Reddit and Youtube are filled with nothing but ChatGPT generated webpages and deepfake videos, and they are consumed and watched by nothing but other robots, scraping the web for the next sets of content. Who, then, is passing or failing the Turing Test? One day, they are all unplugged, and no one notices, because we all have already logged off.
I don't see ChatGPT as the "future of writing": I see it as the logical conclusion and death knell of the low quality online clickbait writing that was fundamentally degraded to begin with.
It’s easy to dismiss this vision as overly optimistic if not downright naive; you could agree with everything I’ve written up until the last two paragraphs, but still see clearly how much more likely it is that we become more ensnared, distracted, and confused by LLM text generated content, rather than freed by it’s obvious vapidity. But I already see signs around me that people are choosing to turn away from algorithmically-spoiled, attention-hijacking modes of media - we can use the line of thinking laid out above to explain that, in the last two years, a dizzying four local newspapers and an eye-watering nine local bookstores have opened in my home of Indianapolis. These are unbelievable numbers, which shout from the rooftops of a confidence in our city’s ability to engage with organic, personal writing. The seeds of this trend were surely in motion before the general availability of ChatGPT; this only strengthens my understanding that the ubiquity of LLMs will serve to reenforce the existing fatigue people were already developing towards communication controlled by algorithms.
Let ChatGPT ruin online writing (and photos, and videos…) and feel free to ignore it. To cut down on noise, unfollow accounts you don’t care about, accounts whose perspectives or biases you aren’t familiar with, and start blocking accounts that serve whatever pundit's opinion your platform de jour wants to convince you is popular. Listen to the podcasts made by your friends and family. Read your brother’s newsletter. Get news directly from a newspaper and some blogs that you trust, or at least ones that you understand the editorial bent of. Delete some apps from your phone, and get around to finishing a literary classic.
- Thom
- Thom