Skip to content

Experiment velocity

The case for velocity: the value of a CRO programme is dominated by how many tests you run, not by how perfect each one is. A programme that ships 50 tests a year at 80% rigour will compound past a programme that ships 12 tests at 95% rigour within a year or two.

The maths is straightforward. If 1 in 5 tests is a real winner (a reasonable hit rate for a mature programme), and each winning test lifts revenue 2-3%, then 50 tests gives you 10 winners and compounded gains of ~25% over the year. 12 tests gives you 2-3 winners and 5-7% compounded gains. The 50-test programme wins by miles.

Each test has fixed overhead: hypothesis, instrumentation, design, build, QA, run, analysis, write-up. Most of that doesn’t scale down for “smaller” tests. Teams new to experimentation often spend 4-6 weeks per test, which caps annual velocity at maybe 10 tests.

The mature-programme overhead is closer to a week per test. The compression comes from:

  • Hypothesis templates and pre-registration formats. Don’t rewrite the analysis plan structure every time.
  • Test tooling that’s actually fast. Building a variant should take hours, not days. Visual editors for simple changes, code-based for complex ones, both well-supported.
  • Centralised analysis with pre-defined dashboards. The analyst’s job is interpretation, not data wrangling.
  • Clear ship / no-ship rules pre-committed. No two-week post-test stakeholder negotiation.
  • Concurrent tests on non-overlapping surfaces. Run 4-5 tests in parallel instead of in series.

The velocity argument isn’t “be sloppy”. It’s “the marginal value of an extra week of rigour on each test is usually less than the marginal value of an extra test in the queue”. Within reasonable limits, faster wins.

The reasonable-limits part matters. Velocity that produces a steady stream of underpowered noise isn’t progress. The minimum viable rigour:

  • Sample size calculated, not guessed.
  • Pre-registration of primary metric and stopping rule.
  • Don’t peek and stop early.
  • Check guardrails before shipping.

Past those, additional rigour usually has lower ROI than another test in flight.

  • Treating velocity as a goal in itself. The goal is learning and shipping wins. Velocity is a means.
  • Shipping every test that crosses alpha to keep the velocity numbers up. Half of those are false positives and they revert when measured later.
  • Ignoring the holdout check on high-velocity programmes. The faster you ship, the more important it is to verify that the wins are real long-term.
  • Assuming velocity scales linearly with team size. Doubling the team rarely doubles the test count. The bottleneck is usually shared infrastructure (often technical debt), not headcount.