Feature flags

A feature flag is a runtime switch in the code that determines whether a feature is shown to a given user. Instead of hardcoding if (showNewCheckout), the code asks if (flag.isEnabled('new_checkout', user)) and the answer depends on configuration outside the deploy.

That single primitive unlocks most of modern experimentation:

A/B test variant delivery. The flag returns “control” or “variant” based on the user’s hash, and the code branches accordingly.
Gradual rollouts. Ship a feature to 5% of users, watch metrics, ramp to 25%, then 100% over a week.
Kill switches. A feature breaking in production can be turned off in seconds without a deploy.
Targeted releases. New features can ship to specific segments (paid plans, EU users, internal accounts) without code changes per cohort.

The combination is what makes experiment velocity possible. Without flags, every test variant requires a deploy, and every rollback requires another one.

Server-side vs client-side flags

The single biggest architectural decision:

Client-side flags. The browser asks the flag service which variant to show. Easier to integrate (just a script tag), bad for randomisation integrity. The client can see both variants in dev tools, the flag service is a third-party dependency, and flicker between control and variant happens during page load.
Server-side flags. The server resolves the flag and renders the correct variant. Harder to set up, much better for serious experimentation. No flicker, no client manipulation, randomisation is consistent.

For CRO on Shopify or similar managed platforms, you’re usually stuck with client-side because you don’t control the server. For SaaS or anywhere with your own backend, server-side is worth the integration cost. The server-side vs client-side note covers the architectural tradeoffs in more depth.

What the testing platform layers on top

Feature flags are the primitive. A testing platform adds:

Consistent hashing so a user gets the same variant across visits
Sample ratio mismatch detection
Metric and goal tracking tied to the variant assignment
Statistical analysis on the resulting data
Mutually-exclusive experiment management so two tests don’t accidentally overlap on the same surface

Building this yourself is doable but expensive. Statsig, Eppo, GrowthBook, LaunchDarkly, and Optimizely all sell this layer. Most teams should buy unless they have unusual scale or unusual requirements.

Where the infrastructure breaks down

Flag sprawl. Hundreds of flags accumulate, most defunct, nobody dares delete them. Build a habit of cleanup as part of test wrap-up.
Flicker on client-side flags. The visitor sees control flash before the variant loads. Hurts the experience and biases the test toward the loading-tolerant.
Inconsistent assignment. Same user gets different variants across sessions because of cookie clearing or device hopping. Common with client-side platforms, harder to detect than it should be.
Coupling the flag system to specific platforms. When you switch from one testing platform to another, you don’t want to rewrite every flag-checking call. Wrap flag calls in your own abstraction layer so the underlying platform is swappable.