Cookie-based identity stitching
Most analytics events fire before the user is logged in. Most of the commercial value comes from joining those anonymous events to the eventual customer record. The stitch is the join, and it relies on a stable cookie. When the cookie breaks, your funnel data does too - and you usually find out months later when the conversion-by-source numbers stop adding up.
The mechanism
Section titled “The mechanism”The pattern most analytics SDKs land on:
- On first visit, generate a random
anonymous_idand store it in a first-party cookie. - Every event from that browser carries the
anonymous_id. - When the user logs in or signs up, fire an
identifycall that ties theanonymous_idto the now-knownuser_id. - The warehouse joins historical anonymous events to the user record via the shared
anonymous_id.
// First visitconst anonId = crypto.randomUUID()document.cookie = `anon_id=${anonId}; max-age=63072000; path=/; samesite=lax; secure`
// Every eventtrack('product_viewed', { anonymous_id: getCookie('anon_id'), user_id: getCurrentUser()?.id ?? null, product_id: 'SKU-123',})
// On login or signupidentify({ anonymous_id: getCookie('anon_id'), user_id: user.id,})The identify call is the load-bearing one. Miss it and the user’s pre-login session is forever stranded as anonymous traffic, no matter how many events you attached the anonymous_id to.
Where it breaks
Section titled “Where it breaks”- Cookie clearing. Private mode, tracking-prevention browsers, manual clears. The same human gets a new
anonymous_id. Their second visit looks like a first. - Cross-device. Same user, different browser. The
anonymous_ids don’t match. Anonymous events on the second device are stranded until the user logs in there too. - Cross-domain. Cookies don’t cross domains. If checkout is on
pay.example.comand the rest of the site is onwww.example.com, you need a shared cookie domain (.example.com) or you lose the user at the most expensive moment. - Safari ITP. First-party cookies set from client-side JS get capped at 7 days. Set the cookie server-side via the
Set-Cookieheader and the cap doesn’t apply. This single change is worth more than most CRO programmes realise. - Ad blockers. Some block the analytics endpoint entirely. The cookie still exists, the events never arrive. The user is effectively invisible until they convert through a non-blocked path.
The server-side cookie fix
Section titled “The server-side cookie fix”This is the most common one to get wrong, so it’s worth its own snippet.
// On the server, on any responseres.setHeader('Set-Cookie', [ `anon_id=${anonId}; Max-Age=63072000; Path=/; SameSite=Lax; Secure; HttpOnly`])HttpOnly means client JS can’t read the cookie, which is fine - the server can stamp the event with the ID before it leaves the origin. The 7-day Safari cap doesn’t apply to server-set cookies. The cookie survives across visits as it should.
If you’re on Shopify or any platform without straightforward server access, this is part of the case for first-party server-side tagging - a subdomain like gtm.yourstore.com running a server container can set the cookie properly even when your storefront can’t.
The deterministic vs probabilistic line
Section titled “The deterministic vs probabilistic line”Cookies are deterministic stitching - the same browser is the same anonymous_id. Probabilistic stitching guesses based on IP, user agent, geolocation, behavioural fingerprint. The latter is what ad platforms use to attribute, and it’s noisier than cookie joins by an order of magnitude.
For CRO and conversion analytics, stay deterministic. The cost of “we don’t know who this user was before login” is much smaller than the cost of “we think we know, and we’re wrong 30% of the time”.
What to attach beyond the IDs
Section titled “What to attach beyond the IDs”A few properties travel with identify and pay off later:
- First-touch attribution. UTM source, medium, and campaign captured on first visit, persisted in another cookie, sent on identify. Connects acquisition channel to lifetime behaviour.
- Anonymous behaviour rollup. Pages visited, products viewed pre-login. Useful for personalisation once you know who the user is.
- Device and locale. Stamped on identify so the user record has the full picture, not just whatever was true at signup.
The funnel instrumentation layer should carry all of this without test-by-test code. If you’re rebuilding identity per test, the foundation isn’t right yet.
Consent and the legal layer
Section titled “Consent and the legal layer”Identity stitching is exactly the kind of cross-session linking that data-protection regulators care about. In the EU and UK, you need a legal basis - usually consent - to set the cookie that the stitch depends on. When consent is declined, the cookie isn’t set, the stitch doesn’t happen, and a meaningful slice of your traffic stays anonymous forever. That’s the correct behaviour, not a bug to engineer around.
Treat the consent state as part of the event schema - every event should carry the state at the moment it fired. That way the analyst querying months later can tell what was missing and why, rather than treating the gaps as data quality issues.