Confidence intervals

A test gives you back a point estimate (say a 4% lift) plus a range the data is consistent with (say 1.2% to 6.8%). That range is the confidence interval. The point estimate is the most likely value, the CI is everywhere else the data hasn’t ruled out.

Width is what tells you how much the data has actually pinned down. A 95% CI of [-0.5%, 8.5%] technically includes positive lift but it also includes negative, so the test hasn’t really told you anything. Narrow intervals = data is informative. Wide intervals = noisy metric, small sample, or both.

The 95% part is the confidence level, and it’s tied to your alpha. A 95% CI uses α = 0.05. Lower alpha (0.01) gives you wider intervals because you’re demanding more certainty before excluding values.

What “95% confidence” actually means

This is the bit everyone gets wrong. A 95% confidence interval does not mean “there’s a 95% probability the true effect lies in this range”. That’s the natural reading, and it’s wrong.

What it actually means: if you repeated this whole experiment many times, 95% of the intervals you’d construct would contain the true effect. The 95% is a property of the procedure, not of any specific interval. Your specific interval either contains the true effect or it doesn’t, we just don’t know which.

If you want a “95% probability the effect is in this range” answer, that’s a Bayesian credible interval, not a frequentist confidence interval. They look almost identical with large samples and uninformative priors, but they’re answering different questions.