Cookie-based identity stitching

Most analytics events fire before the user is logged in. Most of the commercial value comes from joining those anonymous events to the eventual customer record. The stitch is the join, and it relies on a stable cookie. When the cookie breaks, your funnel data does too - and you usually find out months later when the conversion-by-source numbers stop adding up.

The mechanism

The pattern most analytics SDKs land on:

On first visit, generate a random anonymous_id and store it in a first-party cookie.
Every event from that browser carries the anonymous_id.
When the user logs in or signs up, fire an identify call that ties the anonymous_id to the now-known user_id.
The warehouse joins historical anonymous events to the user record via the shared anonymous_id.

// First visit
const anonId = crypto.randomUUID()
document.cookie = `anon_id=${anonId}; max-age=63072000; path=/; samesite=lax; secure`

// Every event
track('product_viewed', {
  anonymous_id: getCookie('anon_id'),
  user_id: getCurrentUser()?.id ?? null,
  product_id: 'SKU-123',
})

// On login or signup
identify({
  anonymous_id: getCookie('anon_id'),
  user_id: user.id,
})

The identify call is the load-bearing one. Miss it and the user’s pre-login session is forever stranded as anonymous traffic, no matter how many events you attached the anonymous_id to.

Where it breaks

Cookie clearing. Private mode, tracking-prevention browsers, manual clears. The same human gets a new anonymous_id. Their second visit looks like a first.
Cross-device. Same user, different browser. The anonymous_ids don’t match. Anonymous events on the second device are stranded until the user logs in there too.
Cross-domain. Cookies don’t cross domains. If checkout is on pay.example.com and the rest of the site is on www.example.com, you need a shared cookie domain (.example.com) or you lose the user at the most expensive moment.
Safari ITP. First-party cookies set from client-side JS get capped at 7 days. Set the cookie server-side via the Set-Cookie header and the cap doesn’t apply. This single change is worth more than most CRO programmes realise.
Ad blockers. Some block the analytics endpoint entirely. The cookie still exists, the events never arrive. The user is effectively invisible until they convert through a non-blocked path.

This is the most common one to get wrong, so it’s worth its own snippet.

// On the server, on any response
res.setHeader('Set-Cookie', [
  `anon_id=${anonId}; Max-Age=63072000; Path=/; SameSite=Lax; Secure; HttpOnly`
])

HttpOnly means client JS can’t read the cookie, which is fine - the server can stamp the event with the ID before it leaves the origin. The 7-day Safari cap doesn’t apply to server-set cookies. The cookie survives across visits as it should.

If you’re on Shopify or any platform without straightforward server access, this is part of the case for first-party server-side tagging - a subdomain like gtm.yourstore.com running a server container can set the cookie properly even when your storefront can’t.

The deterministic vs probabilistic line

Cookies are deterministic stitching - the same browser is the same anonymous_id. Probabilistic stitching guesses based on IP, user agent, geolocation, behavioural fingerprint. The latter is what ad platforms use to attribute, and it’s noisier than cookie joins by an order of magnitude.

For CRO and conversion analytics, stay deterministic. The cost of “we don’t know who this user was before login” is much smaller than the cost of “we think we know, and we’re wrong 30% of the time”.

What to attach beyond the IDs

A few properties travel with identify and pay off later:

First-touch attribution. UTM source, medium, and campaign captured on first visit, persisted in another cookie, sent on identify. Connects acquisition channel to lifetime behaviour.
Anonymous behaviour rollup. Pages visited, products viewed pre-login. Useful for personalisation once you know who the user is.
Device and locale. Stamped on identify so the user record has the full picture, not just whatever was true at signup.

The funnel instrumentation layer should carry all of this without test-by-test code. If you’re rebuilding identity per test, the foundation isn’t right yet.

Identity stitching is exactly the kind of cross-session linking that data-protection regulators care about. In the EU and UK, you need a legal basis - usually consent - to set the cookie that the stitch depends on. When consent is declined, the cookie isn’t set, the stitch doesn’t happen, and a meaningful slice of your traffic stays anonymous forever. That’s the correct behaviour, not a bug to engineer around.

Treat the consent state as part of the event schema - every event should carry the state at the moment it fired. That way the analyst querying months later can tell what was missing and why, rather than treating the gaps as data quality issues.

Cookie-based identity stitching

The mechanism

Where it breaks

The server-side cookie fix

The deterministic vs probabilistic line

What to attach beyond the IDs

Consent and the legal layer