Hypothesis formulation

A test without a hypothesis is just a guess with extra steps. The hypothesis is what you’re claiming and how you’d know if you’re wrong - everything else (sample size, primary metric, duration) flows out of it.

The classical setup has two hypotheses:

Null (H0) - the boring one. There’s no difference between control and treatment. The change you’re testing did nothing.
Alternative (H1) - the interesting one. There is a difference. The change did something.

You don’t “prove” H1. You collect enough evidence to reject H0, which is a different thing. This distinction matters because it’s why a “non-significant” test isn’t a negative result, it’s an inconclusive one. Failure to reject H0 doesn’t mean H0 is true, it means you haven’t seen enough to dismiss it. Easy to forget when stakeholders want a clean yes/no.

Directional vs non-directional

A non-directional (two-tailed) hypothesis just says “different” - the new design will change the conversion rate, either up or down. A directional (one-tailed) hypothesis specifies which way - the new design will increase conversion. Two-tailed is the safe default and what most tools assume. One-tailed gives you a bit more statistical power for the same sample size but at the cost of being blind to the opposite direction. If your one-tailed bet was wrong and the variant actually tanked conversion, the test won’t tell you.

In practice almost everyone should run two-tailed. Even if you think you know which way the change will go, “it tanked” is a result you want to detect.

What a good hypothesis actually looks like

This is the statistical framing - the CRO-flavoured version (testability, mechanism, three-part structure) is in strong hypotheses. A real hypothesis isn’t “let’s change the button colour”. It has three parts:

The change you’re making
The mechanism you think it’ll trigger (the why - usually a behavioural or UX claim)
The metric you expect to move, and roughly by how much

So: “Replacing the generic CTA copy with benefit-led copy on the PDP will increase add-to-cart rate by 5%+, because users currently bounce when the value isn’t clear above the fold.”

If your hypothesis can’t be falsified by the test, it’s not really a hypothesis, it’s a project. “Improve the checkout” isn’t testable. “Removing the discount code field on step 1 of checkout will increase completion rate by 3%+, because the field draws people off-site to hunt for codes” is.

Things people get wrong

Conflating hypothesis with goal. “Make the site convert better” is a goal. The hypothesis is the specific claim about what change moves the metric and why.
Treating a non-significant result as proof the change didn’t work. It might just mean you didn’t have the sample size to detect the effect.
Hypothesising after results are in (HARKing). Look at the data, find the segment where it “worked”, write a hypothesis around it post-hoc. This is how you get fake wins that don’t replicate.
Being vague about the expected effect size. “Lift conversion” gives you no way to do sample size maths. You need a number, even if it’s a guess.