Ratio metrics

Most CRO metrics are ratios. Conversion rate = conversions / sessions. CTR = clicks / impressions. AOV = revenue / orders. The numerator and denominator are both random, both vary across users, and they’re correlated. The standard variance formulas you’d use for a simple mean don’t apply.

What goes wrong if you ignore this: your confidence intervals are too tight, your p-values look more impressive than they should, and your false-positive rate is higher than the nominal alpha. The effect is usually small but it compounds with other issues (peeking, multiple testing) into “we’re getting more ‘significant’ tests than we should”.

The delta method

A mathematical technique for approximating the variance of a function of random variables. For a ratio X/Y, the delta method gives you a variance estimate that accounts for the fact that both X and Y vary, and that they may be correlated. It’s a first-order Taylor approximation, which means it works well when the ratio is reasonably stable and falls apart when the denominator can be zero or near zero.

In practice: most modern testing platforms (Statsig, Eppo, the internal A/B stacks at Meta, Microsoft, Netflix) use the delta method or a bootstrap variant for ratio metrics. Older platforms often don’t, treating per-user ratios as if they were independent observations. You get a “significant” result faster but it’s significant under a wrong model.

When this matters most

The error gets worse when:

The unit of randomisation differs from the unit of analysis. You randomise users but measure sessions. One user has multiple sessions and those sessions aren’t independent.
The metric has a heavy tail. AOV with rare big orders, revenue per user with whales.
The denominator is small. Per-product CTR when products have few impressions.

For a Shopify store running a session-randomised test on session conversion rate, the naive variance is usually fine because session = unit. For a user-randomised test on session-level metrics, you need the delta method or you’re systematically overstating significance.