The HiPPO problem

HiPPO stands for “Highest Paid Person’s Opinion”. The pattern is familiar in any test-driven organisation: a senior stakeholder has a strong opinion about a change, the team runs the test, the result contradicts the opinion, and the change ships anyway. The test was theatre.

The HiPPO problem isn’t really about hierarchy. It’s about whether the organisation has agreed that data decides. In orgs where it has, junior people who run good tests can move decisions. In orgs where it hasn’t, the testing programme is a way of generating reports rather than a way of making decisions.

Why it happens

A few interlocking reasons:

Tests are slow, opinions are fast. A senior leader can have a strong view by lunchtime. The test takes three weeks. By the time the data lands, the political commitment to a direction has already formed.
Tests can produce unwelcome answers. A leader who pushed for a redesign and watches it lose has a face-saving incentive to find reasons the test was wrong.
Test results are interpretable. “Significant at p = 0.06 instead of 0.05” gives wiggle room. So does “the segment we care about did win even if the overall didn’t”. Wiggle room is where HiPPO lives.
There’s no agreed cost to overriding tests. When override is free, it happens whenever the political pressure is high enough.

Structural fixes

The fixes are about removing the moments where override is easy:

Pre-register the decision rule. If the doc says “ship if the primary metric clears 95% confidence and no guardrail breaches”, the post-test debate is over before it starts.
Automate the ship decision. When the rule is in code rather than in a meeting, override requires explicit action. The friction stops most casual overrides.
Make the cost of override visible. “We’re overriding the test result, here’s the expected revenue cost based on the observed effect size” makes leaders weigh it explicitly.
Distribute the decision authority. When a single senior stakeholder owns the call, HiPPO is structurally inevitable. When ship calls go through a small committee with agreed rules, the dynamic shifts.
Track override outcomes. Six months later, did the overridden change perform as the leader’s intuition predicted? Pattern recognition over time gradually shifts the dynamic toward trusting the data.

Where HiPPO is sometimes right

A small but real category. Leaders sometimes have context the test can’t capture - regulatory issues, brand positioning, strategic considerations downstream of the test. These are legitimate overrides.

The way to distinguish is whether the override is justified by external context the test couldn’t measure or by the leader disagreeing with the measured result. The first is fine. The second is HiPPO.

Where it goes wrong

No agreed override threshold. Everyone gets to override their own pet projects without explaining why.
The CRO team softens results to avoid friction. “Inconclusive” instead of “the variant lost” because the latter is awkward to communicate. The data trail gets corrupted at the analysis step.
Senior leaders sit on the analysis review. Their presence biases the discussion. Tests should be reviewed by the team running them first, conclusions written up, then surfaced to leadership.
Building a programme without leadership buy-in. Without senior commitment to “data decides”, the programme never grows past stage 2 (see building an experimentation programme).