How AI Actually Learns — Without the Confusion

The most frequent question I get from engineers unfamiliar with machine learning is this: "How can a system learn if a human is the one giving it instructions?" The question is good because it reveals an implicit assumption—that somebody has to explicitly tell the system: "If pattern A appears, do X. If pattern B appears, do Y." That's how we usually write programs.

But machine learning doesn't work that way. And that's what makes it so powerful.

Let me start with a simple analogy. Imagine a small child who's never seen a cat before. I bring them to a house with five cats. They observe the movement, the shape, the sounds. The next day, they visit a neighbor with three different cats. Then they go to a park and see more—different colors, sizes, different fur lengths.

After seeing hundreds of cats with all this variation, the child develops a mental model. It's abstract—they can't describe it in detail—but they know a cat when they see one. Even for cats they've never encountered before.

Machine learning works the same way. But with formal structure.

We give the system DATA. Lots of it. Thousands, tens of thousands, millions of examples. Say, thousands of labeled photos: "This is a cat" or "This is not a cat." The photos vary—black cats, white cats, tabby cats, different breeds, different angles, different lighting.

The system doesn't know what a cat is. It sees pixels—numbers representing brightness and color at each position. But the system has one capability: it can find patterns. Patterns that, when combined a certain way, allow it to predict the correct label.

So how does it find these patterns? That's where neural networks come in.

Think of a neural network like a simplified version of how neurons in the brain work, but in a much more mathematical form. A network has layers—input layer, hidden layers, output layer. Each "neuron" in one layer connects to neurons in the next layer.

Each connection has a number called a weight. This weight determines how strongly information flows from one neuron to the next. A large weight means a strong, influential connection. A small or negative weight means a weak or inhibitory connection.

When the system first starts, all weights are random. The system hasn't learned anything yet.

Now we give it the first photo. The system processes it through all the layers—the random weights combine—and at the output layer, it makes a prediction. Maybe: "I'm 40 percent confident this is a cat, 60 percent confident it's not."

But we know the true label from our data: "This is a cat." So the system's prediction is wrong. There's an error.

And here's where learning happens. The system sees this error and adjusts all its weights—slightly—to make the error smaller next time.

This adjustment process is called backpropagation, or backward propagation. The algorithm works like this: the system calculates the error at the output. Then it propagates that error backward through the hidden layers. As it propagates, it calculates how much each weight contributed to the error. Weights that contributed significantly get adjusted more. Weights that barely contributed barely change.

The system uses a technique called gradient descent to find the direction of adjustment: "Change the weight in the direction that makes the error smaller."

This happens for EVERY example in the training data. Thousands or millions of times. Each time, the weight changes slightly. Each small change makes the system slightly better at predicting the patterns in the data.

This is the essence of machine learning—not conscious understanding, but iterative optimization. Trial, error, adjust, repeat.

Now comes an implicit question: if the system is just trial and error repeated many times, doesn't that mean it's not really LEARNING—just getting lucky?

The nuanced answer is: both are true, but we need to redefine what "learning" means.

The system doesn't learn in the sense of developing conscious understanding or building a model it can articulate. It can't say: "A cat has these characteristics." It just has a network of weights that successfully capture the correlation between pixels and labels in the training data.

But—and this is important—the process isn't just luck. As long as the training data is representative and high quality, the weights that emerge will generalize to new examples the system has never seen. That's empirically proven. That's what we see in production systems every day.

Now for the numbers. A modern language model can have billions of weights. GPT-3, for instance, has 175 billion parameters. To adjust 175 billion parameters with millions of data examples requires enormous compute resources—thousands of GPUs or TPUs, running for weeks.

But that's why modern frameworks exist—to optimize the process. They use techniques like batching, gradient accumulation, and mixed-precision training to make learning faster and more efficient.

And this is why data quality is absolutely critical. A model trained on bad data will learn bad patterns. Garbage in, garbage out—not just a slogan from computer science years ago. It's fundamental truth.

Here's one important insight worth reflecting on. A neural network that successfully learns doesn't "understand" what it's learned. It doesn't build an inspectable mental model. It just has a very large network of numbers.

This has practical implications. When a model makes a wrong prediction, sometimes we can't say precisely "Why?" We can only observe: "The model failed in this case. Let me add more similar data." The improvement process becomes more empirical than analytical.

But it also has positive implications. The system isn't limited by human intuition or our prior assumptions. It can find patterns humans don't see. It can leverage high-dimensional structure in ways we can't imagine.

That's the fundamental trade-off in machine learning: loss of interpretability, but gain in the capability to capture complex patterns.

1 April 2026
Potato Codex
----------------
This episode is available on Spotify in Bahasa Indonesia. For other courses, ebook, source code, or any ways to connect, visit → linktr.ee/potatocodex