Skip to content

Sample ratio mismatch

Sample ratio mismatch (SRM) is the gap between the traffic split you configured and the split you actually got. Configure a 50/50 test, see 50.2 / 49.8 after 100,000 users - that’s noise. See 52 / 48 - that’s almost certainly a bug. Chi-squared the observed against expected, and if the p-value comes back below something like 0.001 you have an SRM.

Among the threats to validity that can wreck a test, SRM is the one I check first. The reason it matters more than people give it credit for is that it invalidates everything downstream. If randomisation isn’t actually random, your control and variant groups aren’t comparable, and the lift you measured is contaminated by whatever selection happened during assignment. A 5% “win” with an SRM means nothing. You don’t know how much of it is the variant and how much is “the kind of user who happened to land in the variant arm”.

The usual suspects:

  • Caching. A CDN or page cache serves one variant more aggressively because of how the cache key is computed.
  • Bot traffic. Bots follow different paths to humans and end up disproportionately in one bucket.
  • Redirects. The variant uses a redirect that loses a chunk of users on slow connections before assignment completes.
  • Tracking drops. The variant fires an extra event that fails more often, so its sessions are under-counted rather than under-assigned.
  • Identity stitching. A user’s bucket changes between sessions because cookie identity broke and reassignment landed them differently.
  • Flag service issues. The flag service rate-limits or fails one variant more than the other.

Find the cause before you read the results. Once you’ve seen the lift number, the temptation to keep it if it goes your way is corrosive - this is the half of SRM hygiene that takes real discipline.

Statsig, Eppo, and GrowthBook run SRM checks automatically and surface them in the test report. Optimizely Classic, VWO’s older interfaces, and most home-built platforms don’t. If yours doesn’t, run the chi-squared manually before declaring a winner. It takes two minutes.

SRM doesn’t tell you which variant the bias favoured, only that the assignment is broken. The “fix the bug and re-run” reflex is the right one - investigating which way the bias went and adjusting the result is a path to invented numbers.