Strong hypotheses
The statistical version of a hypothesis (see hypothesis formulation) is about H0 and H1 - the null you’re trying to reject and the alternative you’re claiming. That’s the framework for analysis. The CRO version is the framework for test design: what makes the proposed test worth running and the result interpretable.
A strong CRO hypothesis has three parts:
- The change - exactly what you’re varying in the variant.
- The mechanism - why you expect the change to affect the metric. The behavioural or UX claim behind it.
- The expected effect - what metric will move, in what direction, by roughly how much.
A weak hypothesis is missing one or more of these. “Change the CTA copy” is a change without a mechanism or effect. “Improve the checkout” isn’t a change at all - it’s a project, with no testable claim.
What a good hypothesis looks like
Section titled “What a good hypothesis looks like”Replacing the generic CTA copy on the PDP with benefit-led copy will increase add-to-cart rate by 5%+, because cold visitors currently bounce when the value isn’t clear above the fold.
Each part is doing work:
- Change: specific (CTA copy from X to Y, on the PDP)
- Mechanism: specific (cold visitors don’t see the value above the fold)
- Effect: specific (add-to-cart, +5% relative)
The mechanism is what most teams skip. Without it, the test is a guess - if the result is positive you don’t know why, and if it’s negative you don’t know what to try next. The mechanism is what makes the test a learning opportunity rather than a coin flip.
Falsifiability
Section titled “Falsifiability”The hypothesis has to be able to fail. “The new design will perform better” can’t fail - any direction of result confirms or rejects nothing specific. The strong version can fail in specific ways, and each failure mode teaches you something:
- Add-to-cart doesn’t move → the value-above-the-fold mechanism wasn’t the bottleneck
- Add-to-cart moves but checkout doesn’t → the friction is somewhere else
- Add-to-cart drops → the new copy is worse than the old
A non-falsifiable hypothesis can’t fail and so can’t teach. Tests on non-falsifiable hypotheses always look like wins or “inconclusive”, which is the same as never having run them.
How this feeds into prioritisation
Section titled “How this feeds into prioritisation”Strong hypotheses are the input to ICE / PIE / RICE prioritisation. A backlog of hypotheses that each have the three-part structure can be ranked meaningfully. A backlog of “let’s test this” entries can’t.
The prioritisation pass also acts as a quality filter. Weak hypotheses get sent back for sharpening before they ever reach the test queue. A team that runs prioritisation seriously rarely has bad tests, because the bad ones are caught upstream.
Where hypotheses go wrong
Section titled “Where hypotheses go wrong”- The change without the mechanism. “Move the CTA above the fold.” Tests fine, teaches nothing about why.
- The metric without the effect size. “Lift conversion” with no MDE. Sample-size calculation has nothing to work with.
- Hypothesising after the data. HARKing - looking at the result and writing a hypothesis that matches it after the fact. Common in segment analysis.
- The unfalsifiable hypothesis. “Users will find this more intuitive.” How would you know? The hypothesis needs an observable consequence to be testable.
- Testing taste, not behaviour. “The team prefers this design.” Not a hypothesis. Test it on users if you want to know, but don’t claim it’s a CRO test.