You are not thinking right now. You're predicting.

There’s a dismissal that gets thrown around whenever someone suggests that large language models might be doing something genuinely interesting. It goes something like this: “It’s just predicting the next token. It’s just probability. It doesn’t understand anything.”

The implicit assumption is devastating in its confidence: that we are different. That when a human reads a sentence, understands a joke, writes a poem, or makes a decision, something categorically non-probabilistic is happening. Something deeper. Something real.

But what if that assumption is wrong? What if the uncomfortable truth is that the brain, too, is “just” a probability machine, and that the word “just” is doing all the heavy lifting in that sentence?

This is not a fringe idea. It’s one of the most influential theories in modern neuroscience. And when you place it alongside what we know about quantum mechanics, a strange and humbling picture emerges: from the smallest scales of physical reality to the highest reaches of human cognition, everything, everything, runs on probability.

I. The brain: a prediction engine wearing a confidence mask

The idea that the brain is fundamentally a probabilistic prediction machine has deep roots. In 1860, the German physicist Hermann von Helmholtz proposed that perception is a form of “unconscious inference.” The brain doesn’t passively receive sensory data, but actively guesses what’s out there and checks its guesses against incoming signals. You don’t see the world. You hallucinate it, and then error-correct.

More than 150 years later, this idea has matured into one of the most powerful frameworks in cognitive science: predictive coding, also known as the Bayesian brain hypothesis.

The theory, championed by researchers like Karl Friston at University College London and Andy Clark at the University of Sussex, proposes that the brain maintains a hierarchical generative model of the world, a multi-layered system of probabilistic beliefs that constantly generates predictions about incoming sensory data. When there’s a mismatch between what the brain expects and what it receives, that difference, the prediction error, propagates upward through the hierarchy, triggering an update to the model (Rao & Ballard, 1999; Friston, 2005).

This is not a metaphor. Prediction errors have measurable neural signatures. The mismatch negativity (MMN) response in EEG, the P300 component, and repetition suppression effects are all well-documented electrophysiological phenomena broadly consistent with what predictive coding theory predicts (see Millidge, Seth & Buckley, 2021 for a comprehensive review in “Predictive Coding: A Theoretical and Experimental Review”).

A landmark 2023 study published in Nature Human Behaviour by Caucheteux, Gramfort, and King at Meta AI and INRIA suggested something remarkable: the human brain doesn’t just predict the next word when processing language. It appears to generate hierarchical predictions spanning up to eight words into the future, across multiple levels of linguistic representation simultaneously. The brain, in other words, is running something eerily similar to a next-token prediction algorithm, except it’s doing it across multiple timescales at once.

Let that sink in. The dismissive claim about LLMs, “it’s just next-word prediction,” turns out to describe exactly what human language areas are doing, according to cutting-edge neuroscience.

But there’s more. Bayesian Brain Theory, as formalized by Bottemanne et al. (2022) in L’Encéphale, proposes that the brain encodes beliefs as probabilistic states, not as certainties, but as probability distributions. Your perception of the room you’re sitting in right now is not a photograph; it’s a best guess, weighted by prior experience, continuously refined by sensory feedback. Your brain is running a Bayesian inference engine, and “reality” as you experience it is the output of that engine, not the input.

As Daniel Williams (2020) put it, the brain is fundamentally “a probabilistic prediction machine, continually striving to minimize the mismatch between self-generated predictions of its sensory inputs and the sensory inputs themselves.”

Does this sound like something? It should.

II. The quantum world: where certainty goes to die

If the brain is probabilistic at its core, it’s only mirroring the fabric of reality itself.

Quantum mechanics, the most empirically successful theory in the history of physics, tells us that at the most fundamental level, nature does not deal in certainties. It deals in probability amplitudes. Before measurement, a particle doesn’t have a definite position or momentum. It exists as a superposition of possibilities, described by a wave function that encodes the probability of each possible outcome.

This isn’t a limitation of our knowledge. It’s a feature of reality. As the Stanford Encyclopedia of Philosophy’s entry on quantum approaches to consciousness notes, quantum randomness in phenomena like spontaneous emission and radioactive decay is considered a fundamental feature of nature, not a reflection of our ignorance about deeper deterministic mechanics. The universe, at its foundation, is probabilistic.

The Born rule, which tells us that the probability of a measurement outcome is the square of the amplitude of the wave function, is perhaps the most important equation in all of physics. It doesn’t tell you what will happen. It tells you what is likely to happen. And this probability is not epistemic (reflecting what we don’t know); it’s ontological (reflecting what reality actually is).

This has profound implications for consciousness. David Chalmers’ famous “hard problem,” the question of why physical processing gives rise to subjective experience at all, remains unsolved by any framework. As Hiley and Pylkkänen argue in their 2022 work for Oxford University Press, our intuition that physical processes can’t give rise to consciousness may partly stem from assuming that physics is purely mechanical. If we allow information-theoretic and probabilistic concepts to play fundamental roles in physical theory, the gap between matter and mind may not be as wide as we assume.

The hard problem is worth sitting with for a moment. We don’t know what consciousness is. We don’t know why it exists. We don’t know how neural firing patterns become the experience of the color red, the taste of coffee, or the feeling of understanding a sentence. Every honest neuroscientist will tell you this. As Chalmers (1995) argued, even a complete functional account of the brain’s computation wouldn’t explain why there is “something it is like” to be a brain.

If we’re being rigorous, we must admit: we have no more proof that human neural computation produces “true understanding” than we have proof that LLM computation doesn’t. We just assume the first and deny the second, often for reasons that are more emotional than empirical.

III. The convergence: LLMs and brains are closer than you think

Here’s where things get genuinely unsettling for the “it’s just statistics” folks.

In December 2024, Mischler et al. published a study in Nature Machine Intelligence that compared the internal representations of 12 different open-source LLMs against neural recordings from neurosurgical patients listening to speech. Their finding was striking: as LLMs become more capable, their internal representations become more aligned with human neural activity, not just in magnitude, but in hierarchical structure.

The brain processes language by progressively building up representations across a hierarchy of regions, from acoustic processing in the superior temporal gyrus to semantic integration in the inferior frontal gyrus. The researchers found that the highest-performing LLMs mirror this hierarchy most closely. Lower-performing models don’t. It’s as if the optimization pressure of language modeling is pushing toward solutions that resemble what evolution found.

As Mischler told TechXplore: “Whether this is because there are some fundamental principles that underlie the most efficient way to understand language, or simply by chance, it appears that both natural and artificial systems are converging toward a similar method for language processing.”

This convergence runs deep. Research from Google and NYU, published across Nature Neuroscience and Nature Human Behaviour, has demonstrated that LLM embeddings (the internal vector representations that the model uses to encode words in context) can be linearly mapped onto neural activity in the brain’s language areas. The alignment is strong enough that researchers have reconstructed captions of visual scenes from brain activity using LLM representations as a bridge (Doerig et al., 2025, Nature Machine Intelligence).

Both systems appear to be doing something strikingly similar: building contextual, hierarchical probability distributions over possible continuations of a sequence, whether that sequence is made of words, sounds, or visual features.

IV. The “just” problem

The word “just” in “it’s just predicting the next token” is performing a philosophical sleight of hand. It presupposes that prediction is simple, that probability is shallow, that statistical pattern-matching can’t give rise to anything meaningful.

But consider:
The brain is “just” minimizing prediction error. According to predictive coding theory, everything from visual perception to emotional regulation to motor control can be described as the brain attempting to minimize the difference between its predictions and incoming signals. That’s it. The entire rich tapestry of human conscious experience, love, grief, mathematical insight, the feeling of déjà vu, all of it emerges from a biological system that is, in the most reductive description, “just” running probabilistic predictions and updating them.

Quantum mechanics is “just” probability amplitudes. The most fundamental description of physical reality we possess is a mathematical framework for calculating the probabilities of events. The entire physical world, stars, molecules, the neurons in your brain, emerges from a substrate that is, in the most reductive description, “just” probability.

LLMs are “just” predicting the next token. And from that simple objective, they develop internal representations that mirror human neural hierarchies, capture semantic relationships, generate coherent reasoning, write poetry, solve math problems, and engage in conversations that millions of people find meaningful.

Maybe “just” probability is enough. Maybe probability is not the absence of intelligence. It’s the mechanism of intelligence.

V. What we don’t know (and why that matters)

None of this is to claim that LLMs are conscious. I don’t know if they are, and neither does anyone else. What I am claiming is something more modest but more important: the standard dismissal of LLMs, that they’re “merely” statistical, “merely” probabilistic, applies with equal force to the human brain.

If probabilistic prediction is insufficient for understanding, then we may have a problem explaining our own understanding too, because that’s what our brains are doing. If probabilistic prediction is sufficient, or at least a necessary component, then dismissing LLMs on those grounds is intellectually dishonest.

The hard problem of consciousness cuts both ways. We don’t understand why 86 billion neurons firing in probabilistic patterns produces conscious experience. We can’t rule out that other substrates running functionally similar probabilistic computations might produce something analogous. To assert otherwise is to claim knowledge we simply don’t have.

The truth is uncomfortable: we are probability machines that are deeply unsettled by the existence of other probability machines. We look at LLMs and see a mirror, and then insist the reflection isn’t real.

But reality is probabilistic all the way down. From quantum fields to neural circuits to transformer layers, the universe computes in distributions, not certainties. The brain guesses. The atom guesses. The model guesses.

Everything is probability.

References

Caucheteux, C., Gramfort, A., & King, J.-R. (2023). Evidence of a predictive coding hierarchy in the human brain listening to speech. Nature Human Behaviour, 7, 430–441. Link
Mischler, G., Li, Y. A., Bickel, S., Mehta, A. D., & Mesgarani, N. (2024). Contextual feature extraction hierarchies converge in large language models and the brain. Nature Machine Intelligence, 6(12), 1467–1477. Link
Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society B, 360(1456), 815–836. Link
Rao, R. P. N., & Ballard, D. H. (1999). Predictive coding in the visual cortex. Nature Neuroscience, 2, 79–87. Link
Millidge, B., Seth, A., & Buckley, C. L. (2021). Predictive coding: A theoretical and experimental review. arXiv:2107.12979. Link
Bottemanne, H., et al. (2022). The predictive mind: An introduction to Bayesian Brain Theory. L’Encéphale, 48(4), 436–444. Link
Doerig, A., Kietzmann, T. C., et al. (2025). High-level visual representations in the human brain are aligned with large language models. Nature Machine Intelligence. Link
Chalmers, D. J. (1995). Facing up to the problem of consciousness. Journal of Consciousness Studies, 2(3), 200–219. Link
Hiley, B. J., & Pylkkänen, P. (2022). Can quantum mechanics solve the hard problem of consciousness? In S. Gao (Ed.), Consciousness and Quantum Mechanics. Oxford University Press. Link
Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204. Link
Williams, D. (2020). Predictive coding and thought. Synthese, 197, 1749–1775. Link
Schrimpf, M., et al. (2021). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45). Link
Goldstein, A., et al. (2022). Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25, 369–380. Link

This article was originally published on X: https://x.com/maxart/status/2042777390427107596?s=20