Skip to content

Sequential testing

In a classical frequentist test you commit to a sample size up front, run to that sample, then check the result once. Peek before the planned sample is hit and your effective alpha is way higher than the 0.05 you nominally set.

Sequential testing is a family of methods that mathematically account for peeking. You can check results as data comes in and stop early when you have enough evidence, without inflating false-positive rate. The trade-off is more complex maths and stricter stopping rules than they appear.

  • SPRT (Sequential Probability Ratio Test) - Wald’s original from the 1940s. Compute a running likelihood ratio, stop when it crosses one of two thresholds. Mostly historical interest, rarely used directly in CRO platforms.
  • Group sequential testing - check at predefined intervals (every week, every 5000 visitors) with adjusted thresholds at each look. Standard in pharma trials.
  • Always-valid inference - methods that produce p-values and confidence intervals that remain valid no matter how often you peek. Statsig’s sequential testing is in this family.
  • Bayesian sequential - Bayesian methods don’t have the same theoretical peeking penalty as frequentist ones, though early stopping still biases the effect size estimate upward.

Why most CRO programmes don’t use it formally

Section titled “Why most CRO programmes don’t use it formally”

The honest reason: most tools that claim to handle peeking are doing Bayesian inference with weakly informative priors and labelling it “sequential”. Which is fine, but it’s not the same as a formal sequential frequentist procedure.

For most teams running tests on Shopify or similar platforms, the practical advice is simpler. Pick your sample size, run to it, look once. If you want to peek legitimately, use a tool that does proper sequential or Bayesian analysis. Don’t just check daily and stop when you like the result.