Primary metric

The primary metric is the one number that decides whether a test wins or loses. Pre-committed before the test starts, watched throughout, and used to call the result. Everything else (secondary metrics, segments, qualitative feedback) is context.

Picking it well matters because of the multiple testing problem. Running a test against five metrics and shipping whichever shows a “significant” win inflates the false-positive rate dramatically. The discipline of committing to one metric pre-test is the cheapest version of multiple-testing control.

What makes a good primary metric

Tied to the hypothesis. If your hypothesis is about cart abandonment, the metric is cart-completion rate. Not session conversion, not revenue, not bounce. The metric matches the claim.
Sensitive to the expected effect. A metric that doesn’t move at the level of intervention you’re testing won’t pick up the signal even if the effect is real. Testing a checkout copy tweak against site-wide revenue is usually wrong - too much noise above the change.
Observable in the test window. Revenue per user requires repeat purchase to be meaningful, which often doesn’t develop inside a 2-3 week test. Use a leading indicator and measure full revenue impact post-launch with a holdout.
Defensible to stakeholders. If you have to spend the post-test meeting explaining why you measured X instead of Y, X probably wasn’t the right metric.

Proxy metrics, north stars, and primary metrics

Three overlapping concepts that get confused:

North star - the company-level metric the team is ultimately trying to move. Long horizon, often lagging.
Primary metric - the test-level metric. Has to be observable within the test, sensitive to the change, and ideally linked causally to the north star.
Proxy metric - a stand-in for something harder to measure. Click-through as a proxy for revenue intent. Useful, but the proxy and the real thing can come apart.

Most CRO tests use a proxy as the primary because the north star isn’t observable in a test window. The bet is that the proxy moves with the north star. That bet should be checked periodically, ideally with a holdout, because proxies drift.

Things people get wrong

Not committing pre-test. The whole point is to prevent metric-shopping after results land. A primary metric chosen after seeing the data isn’t a primary metric, it’s a rationalisation.
Choosing a metric that’s too noisy for the sample. Revenue per session at low sample sizes is mostly variance. Use a higher-frequency leading indicator.
Choosing too many “primary” metrics. If you have three, you have zero. The decision rule needs a single number.
Forgetting guardrails. The primary metric tells you whether the test wins. Guardrails tell you whether the win is real or comes at a hidden cost.