Novelty effect and regression to the mean

Two separate phenomena that both mess with early test reads. Different mechanisms, same practical consequence: stop a test too early and the lift estimate won’t hold up.

Novelty effect

When users encounter something new on a site they know well, they engage with it differently. They click the unfamiliar element just to see what it does. They notice the new layout because their eye isn’t trained yet. This bumps engagement metrics on the new variant for the first few days, then fades as users habituate.

Where it shows up:

A new homepage banner gets unusually high CTR for the first week, then settles
A redesigned PDP shows higher add-to-cart early, then drifts back toward control
Repeat-visitor segments showing strong lifts that don’t appear in first-time visitors

The fix is patience. Run tests at least one or two full purchase cycles so the novelty has time to fade. For DTC that’s usually two to four weeks minimum, longer if your repeat purchase cycle is slow.

Regression to the mean

A statistical phenomenon, not a behavioural one. Extreme observations are more likely to be followed by less extreme ones, because the extreme reading included a chunk of randomness that won’t repeat. Check your test at day 3 and see the variant showing a 15% lift, the day-10 reading will almost certainly be smaller. Not because the effect “wore off” but because the day-3 number was inflated by noise.

This is why early peeking is dangerous. The cases where you’d be tempted to stop early (“the variant is crushing control!”) are precisely the cases where the early reading is the most noise-inflated. Sequential testing methods exist to allow valid peeking, but most testing platforms aren’t doing real sequential.

Why they get confused

Both make early effects look bigger than they really are. The mechanism is different (novelty is behavioural, regression is statistical) but the prescription is identical: don’t stop early, run to your planned sample size.

Things people get wrong

Declaring a winner from week-one data on a redesign. Almost always inflated by both effects.
Attributing all early lift to novelty. Often it’s partly real, partly novelty, partly regression. You can’t disentangle them at the early stages, which is itself the reason to wait.
Assuming long-running tests “average out” novelty. They do, but only if you measure the steady-state period, not the cumulative average that still includes the novelty window.
Mistaking regression to the mean for confidence interval shrinkage. Related but distinct. Regression is about the point estimate moving toward the true value. CI shrinkage is about the uncertainty narrowing.