The Spectrum Between Gut and Data

I've been thinking about how PMs use experiments. And, I don't mean the textbook version: define a hypothesis, set your significance threshold, run the test, read the results, act accordingly. I mean how the decision really gets made when the results come back and you're sitting there with a number that either confirms or contradicts what you already believed.

Because here's what I've noticed, partly by watching others and partly by catching myself: the experiment result is rarely the thing that drives the decision. There's always something else in the room. A conviction about the user problem. A sense of where the market is going. A feeling that the test didn't capture the thing that actually matters. And PMs navigate this tension very differently.

The spectrum

On one end, you have PMs who won't ship anything without a stat sig result. They run clean experiments, respect methodology, and let the numbers decide. This sounds like the right approach, and sometimes it is. But taken too far, it becomes a crutch: a way to avoid making a judgement call by outsourcing it to a p-value. You can be so disciplined about evidence that you stop developing the instinct for when the evidence is incomplete, or measuring the wrong thing, or answering a question nobody needed answered.

On the other end, you have PMs who have strong product instincts and treat experiments as one input among many, sometimes not even the most important one. They'll run a test, look at the result, weigh it against everything else they know, and occasionally push through a decision the data didn't support. This sounds reckless, and sometimes it is. But sometimes the experiment genuinely didn't capture what mattered, and the PM's read of the situation was closer to reality than the metric was.

Most PMs live somewhere between these two, and where they land shifts depending on the stakes, how much they trust the test setup, and how strong their conviction is going in. There's no fixed correct position on this spectrum.

Where I sit (and why I'm questioning it)

I think I lean toward the gut end. I tend to have strong opinions about what a product/feature should do, and when an experiment comes back inconclusive or contradicts my instinct, my first reaction is usually to interrogate the experiment, not my instinct. Was the sample right? Was two weeks long enough? Was the metric we picked actually a good proxy for the thing I care about?

Sometimes that interrogation is legitimate. Experiments have real limitations. A test can be insignificant because the effect is small, or because the setup was noisy, or because you were measuring the wrong thing entirely. "Not significant" and "doesn't work" are different statements, and conflating them is a real mistake that data-heavy teams make all the time. But sometimes I'm just looking for reasons to override the data because I don't want to let go of the idea. That's the uncomfortable part. The line between "healthy scepticism of experimental limitations" and "confirmation bias with extra steps" is blurry, and I'm not always sure which side I'm on.

What I actually weigh

When a result contradicts my gut, there's no clean algorithm. It's more like a few things running in parallel:

How trustworthy was the experiment? If the setup was shaky (wrong audience, short duration, noisy metric), my gut gets more weight. If the experiment was well-designed and the signal is clear, I take the result more seriously even when I don't like it.

How strong is my conviction, and can I articulate why? If I have a clear theory about the user problem that the experiment didn't directly test, that's worth holding onto. If my conviction is just "I feel like this should work" without a coherent reason, that's a signal I should probably listen to the data.

What are the stakes? Low-stakes, easily reversible decisions: I'll go with my gut more freely. High-stakes, hard-to-undo decisions: I give the data more weight, or I at least make sure I'm not fooling myself before overriding it.

None of this is a framework. It's just how I've noticed myself making decisions, written down. And writing it down is partly an exercise in figuring out whether this process is actually reasonable or whether I'm rationalising something less disciplined.

The thing I keep coming back to

Experiments exist to reduce decision risk. I believe that. The purpose of running a test is to be a bit less wrong about what to do next. But "reduce risk" and "decide for you" are different jobs. A p-value tells you the result probably isn't random noise. It says nothing about whether the effect matters, whether it was the right thing to measure, or whether your broader read of the situation is wrong.

Most PM discourse frames this as settled: be data-driven, trust your experiments, let the numbers guide you. And that's decent advice for someone who's never run an experiment. But for PMs who've been doing this for a while, the harder question is how much weight should the experiment carry versus everything else you know? And when you override the data, are you being wisely sceptical, or are you just being stubborn?

I don't have a clean answer. I lean toward trusting gut more than most PMs probably should, and I'm aware that this is the kind of thing that looks like wisdom when it works and like arrogance when it doesn't. The honest version is that I'm still calibrating, in life within and outside of product management.

What I do know is that the spectrum exists, and that where you sit on it matters. Thinking about it every once in a while might be worth doing.