Minimum detectable effect

The minimum detectable effect (MDE) is the smallest lift you commit to being able to detect before the test starts. Set it before sample size calculation. Smaller MDE = bigger sample required = longer test. Bigger MDE = smaller sample = faster test, but smaller real effects will go undetected.

MDE is the business decision dressed up as a statistical input. Alpha and power are conventions you can mostly default. MDE is where you actually have to say “this lift would be worth shipping; smaller than this isn’t”.

How it factors in

In a sample size calculation, MDE is one of four inputs (baseline rate, MDE, alpha, power). Lower MDE → bigger sample. The relationship is roughly inverse-squared: halve the MDE and you roughly quadruple the required sample.

In a typical Shopify store doing 30k sessions a month with a 2% conversion baseline:

20% relative MDE → ~5,000 sessions per variant → tractable in 1-2 weeks
10% relative MDE → ~20,000 sessions per variant → 4-6 weeks
5% relative MDE → ~80,000 sessions per variant → not happening

This is why most small-store CRO is about big swings. The maths only supports detection at large effect sizes.

How to set MDE

The right question is: what’s the smallest lift that’s worth shipping?

That’s a business question. Translate the lift into revenue, weigh against the engineering cost of building, weigh against the opportunity cost of not running a different test. The number that comes out is your MDE.

The wrong question is: what lift do I expect from this change? That’s a forecast, not a threshold. Forecasts feel more concrete but they bias you toward setting MDE wherever you hope the effect will land, which produces “significant” results clustered just above the threshold because that’s where the maths is most favourable.

A practical heuristic for Shopify-scale programmes:

High-stakes, durable changes (offer changes, checkout redesigns): MDE = 5-10% relative
Page-level optimisation (PDP elements, landing page copy): MDE = 10-15% relative
Microcopy and button tests: MDE = 20%+ relative, or accept that the test is directional, not statistical

MDE is a threshold, not a prediction

The decision rule that follows from MDE: ship only if the observed effect is statistically significant AND above MDE. Most platforms only check the first half. A significant result below MDE is usually noise dressed as a win - it crossed alpha by chance because the effect was small enough that the sample size you ran could produce it as variance.

This is the “real but trivial effect” failure mode. You shipped a 0.3% lift, watched it not replicate in production, and concluded “the test was wrong”. The test wasn’t wrong, the MDE check was missing.

Where it goes wrong

Setting MDE based on what you hope to find. “I think this’ll lift 5%” → MDE 5% → tests come back significant or non-significant in noise. The MDE should reflect business value, not optimism.
Setting it too low because the team wants “all tests to count”. Produces multi-month tests that block the queue and rarely move anything.
Treating MDE as fixed across surfaces. A 10% MDE on the PDP and a 10% MDE on the order confirmation page are different bets - different traffic, different variance, different downstream impact.
Ignoring it post-test. A significant result below MDE shouldn’t ship. Most CRO programmes ship anyway because nobody checked.
Relative vs absolute confusion. “10%” is ambiguous: is that 10% of the 2% baseline (so 2.2%) or 10 percentage points (so 12%)? Relative is the default but make it explicit before anyone misinterprets.