Skip to content

Variance reduction

In a normal test, the variance in your metric (how much it bounces around naturally) limits how small an effect you can detect at a given sample size. Variance reduction is the family of methods that strips out some of that noise without throwing away data, giving you more power per visitor. Run the same test with half the noise and you’ve effectively doubled your traffic.

CUPED (Controlled Pre-Experiment Data) is the big one. The intuition: a chunk of why a user converts during your test has nothing to do with the variant. They were already a high-converting type of user. If you can measure their pre-test behaviour, you can subtract out the baseline difference and only attribute what’s left to the variant.

Concretely you regress your outcome metric on a pre-test covariate (often the same metric measured before the test started), compute the residual, and use that as your “adjusted outcome”. Microsoft reported 50% variance reduction on their experimentation platform, which roughly doubles your effective traffic.

CUPED needs pre-test data on the same users, which is fine for logged-in product analytics but harder for anonymous web traffic. If your visitors are mostly one-shot anonymous sessions (typical for Shopify) you can’t easily apply CUPED to most metrics.

Instead of randomising all traffic into one big pool, you stratify (split into bands by some characteristic - device, source, country, customer tier) and randomise within each stratum. This guarantees the variants are balanced on whatever you stratified by, removing one source of variance from the comparison.

Less commonly used in CRO platforms but baked into most academic and pharma trial design. The win is biggest when the stratification variable is strongly predictive of the outcome. Device type and traffic source usually qualify.

Most Shopify and small SaaS programmes are underpowered for the effect sizes they’re testing. Variance reduction is one of the few ways to claw back power without raising more traffic. If you’re running on a platform that supports CUPED (Statsig, Eppo, some custom internal stacks) it’s basically free statistical power. Turn it on.

  • Treating variance reduction as cheating. It isn’t. The maths is sound, you’re not changing the estimand, you’re just measuring it more precisely.
  • Expecting it to save underpowered tests. CUPED helps but it won’t magically turn a 5,000-session test into a usable result if you needed 50,000.
  • Using post-test covariates. CUPED uses pre-test data only. If you adjust on something measured during or after the test, you’ve broken the randomisation and biased the estimate.
  • Skipping it because the platform doesn’t support it. If you’re at any scale, the lift in test velocity is worth the engineering effort to add support.