Building an experimentation programme
A programme is what you build when you want experimentation to outlast the people who set it up. A few one-off A/B tests aren’t a programme. A pattern of recurring tests, shared learnings, and infrastructure that supports both is.
The components, roughly in order of importance:
Infrastructure
Section titled “Infrastructure”Without the right tooling, every test costs an order of magnitude more than it should. The minimum:
- A testing platform that handles randomisation, assignment consistency, and basic statistical analysis correctly. Many “platforms” don’t.
- Analytics that can join experiment data with downstream metrics (LTV, retention, support tickets).
- A way to surface running tests to the rest of the org so nobody accidentally launches conflicting tests on the same surface.
- Documented pre-registration and analysis templates.
Process
Section titled “Process”The recurring rhythms that produce tests:
- Hypothesis backlog. A live document of testable ideas, ranked by expected value.
- Weekly or fortnightly test launches. Predictable cadence beats sporadic bursts.
- Standing analysis review. A regular meeting where finished tests get interpreted and ship / no-ship decisions get made together.
- Win and loss documentation. Both shipped winners and failed tests written up so the learning compounds across the team.
Culture
Section titled “Culture”The hardest part. The cultural shifts:
- Failure is data, not embarrassment. Most tests fail. A programme where failed tests get hidden or rationalised will run fewer tests and learn less.
- Stakeholders accept that the test decides. No post-hoc “I think we should ship it anyway”. If that conversation happens, the programme isn’t a programme yet.
- The HiPPO problem is checked. Senior stakeholders don’t override results because they have a hunch.
- Hypotheses come from everywhere. Customer support, paid media team, product, design - not just CRO specialists. The best hypotheses often come from people who see customer behaviour daily.
Common stages of programme maturity
Section titled “Common stages of programme maturity”A loose progression:
- One-offs. Occasional A/B tests when someone has a strong opinion to settle. No real programme.
- Habitual. Tests run regularly but each is bespoke. Inconsistent rigour, slow analysis.
- Templated. Standard pre-registration, defined metric set, repeatable analysis. Velocity climbs.
- Compounding. Multiple parallel tests, holdouts for long-run measurement, shared learning across teams. The programme is a source of ongoing competitive advantage.
Most programmes plateau at stage 2 or 3. Getting to stage 4 requires sustained investment in infrastructure and culture, often from leadership.
Things people get wrong
Section titled “Things people get wrong”- Starting with infrastructure and never building the cultural muscle. The fanciest platform doesn’t help if nobody trusts the results.
- Starting with culture and never investing in infrastructure. The team is willing but every test takes a month.
- Buying a platform and assuming the programme follows. Platforms enable, they don’t generate.
- Measuring programme health by test count alone. Quality of hypotheses matters too. 50 button-colour tests learn less than 10 well-chosen ones.