Technical debt

Technical debt sounds like an engineering concern. For CRO it’s the silent killer of programmes that should be running well. Bad code makes tests slow to build, breaks variant delivery, corrupts measurement, and pushes the engineering team into perpetual firefighting rather than supporting experimentation.

The debt that matters most for CRO isn’t the kind engineers usually complain about (legacy frameworks, outdated dependencies). It’s the kind that affects testability:

Tangled front-end code where changing one element requires touching three others. Each test variant takes days because the change ripples.
Inconsistent event firing where the same event means different things on different pages. Analysis becomes archaeology.
No abstraction between testing platform and code so swapping platforms means rewriting hundreds of flag-checking calls.
Slow build and deploy cycles where a one-line variant change waits a week for the next release.
Performance debt where the page is slow enough that test variants need to fight the baseline rather than the alternative variant.

Why this matters for velocity

Each test has a fixed friction cost: how long it takes from “we have a hypothesis” to “the variant is in production and tracking is firing correctly”. On a clean codebase that cost is hours. On a debt-heavy one it’s days or weeks. Multiply across a year of testing and the velocity difference dominates everything else.

The compounding works both ways. A clean codebase makes tests cheap, which means more tests, which means more wins shipped, which means more code changes. Without discipline, the new code adds back the debt the cleanup removed.

How debt corrupts measurement

Worse than slow tests is wrong tests. Common ways debt produces unreliable results:

Variant-specific bugs. The variant code path has a tracking bug the control doesn’t, so the variant under-counts conversions. The “loss” is measurement, not effect.
Cross-test interactions. Two tests running on overlapping code paths corrupt each other’s metrics because the codebase doesn’t enforce mutual exclusivity.
Performance variance between variants. A variant that adds 200ms of JS executes differently from control. Performance differences become measurement differences.
Inconsistent randomisation due to platform glitches. Same user gets different variants because the assignment code has race conditions.

These all produce results that look statistically clean but aren’t. The test “wins” or “loses” but the underlying signal is corrupted by infrastructure.

What to actually do about it

The honest pattern from teams that get out of this:

Build the debt-paying into the test cycle. Every test ships a small cleanup of the surrounding code. Slow but compounds.
Treat experimentation infrastructure as a first-class engineering investment. Feature flags, event taxonomy, and analysis pipelines need maintenance and attention.
Make the cost of slow tests visible to leadership. “Our test cycle is 6 weeks because of A, B, C - here’s the revenue impact of cutting it to 1 week”.
Audit measurement quality regularly. AA tests catch infrastructure-level bugs that nobody else will notice.

Where this gets ignored

The pattern that produces persistent CRO debt:

CRO and engineering are different teams that don’t share the consequences. Engineering optimises for ship speed of normal features. CRO suffers from the resulting debt but can’t fix it because the codebase isn’t theirs.
Leadership only counts test wins, not test-cycle time. The team that ships 10 painful tests a year looks more productive than the one that ships 5 while paying down infrastructure debt.
Refactoring is invisible. Cleaning up event tracking doesn’t show up in any user-facing metric, so it doesn’t get prioritised, even though it’s the bottleneck.

The fix is leadership-level acceptance that infrastructure investment compounds. Most programmes don’t do this and stay stuck.