Event schema design
What to call the event, what to attach as a property, what to leave to joins, when to version. The decisions one level below the spec doc. Get them wrong and the data still arrives, it’s just hard to use a year later.
The funnel instrumentation note covers the programme-level view of taxonomy. This one is about what goes inside each event.
Naming
Section titled “Naming”Pick a shape and apply it everywhere.
added_to_cart // snake_caseaddedToCart // camelCaseAddedToCart // PascalCasecart.item.added // namespacedThe shape matters less than the consistency. Mixed conventions in the same warehouse mean every SELECT becomes a guessing game.
A few principles that age well:
- Past tense for what happened.
added_to_cart, notadd_to_cart. The event is a record of an action that occurred, not an imperative. - Object then action.
cart_item_addedreads alphabetically alongside other cart events,added_cart_itemdoesn’t. Once you have a hundred events to sort through, this matters. - Reserve
viewedfor surfaces,clickedfor elements.product_viewedis the PDP.clicked_add_to_cartis the button. Don’t conflate them.
If you’re on GA4, this decision is partly made for you. GA4’s built-in ecommerce reports only populate when you use Google’s exact recommended event names - add_to_cart, begin_checkout, purchase, etc. Use those for the events GA4 cares about, then apply your own convention to everything else. Mixing schemas is annoying but cheaper than reimplementing GA4’s monetisation reports.
Properties
Section titled “Properties”The right question on each property: would I rather attach this once at event time, or join it in later?
Attach at event time:
- Anything that can change. Price, position, variant displayed. The state at the moment matters - joining later gives you today’s value, not the value the user saw.
- Anything that distinguishes this event. Which product, which button, which CTA copy. If you’d want to filter or group by it, attach it.
Leave to joins:
- Anything stable per entity. Product category, supplier, country of origin. Look it up from the product table at analysis time.
- Anything derivable. Don’t attach
is_mobileif you already haveuser_agent.
// goodtrack('product_viewed', { product_id: 'SKU-123', price_cents: 4999, currency: 'GBP', list_id: 'pdp_recommendations', list_position: 3,})
// bad - too much, half of it joinable, money as a stringtrack('product_viewed', { product_id: 'SKU-123', product_name: 'Wool jumper', product_category: 'Knitwear', product_brand: 'Acme', price: '£49.99', is_in_stock: true, user_email: 'a@b.com',})Store as an integer in the minor unit (price_cents: 4999), with the currency as a separate field. Never as a formatted string. Never as a float. Floats and money are an old, expensive lesson and 0.1 + 0.2 !== 0.3 is one schema decision away from your ecommerce report.
Required properties
Section titled “Required properties”Every event should carry a small set of properties regardless of type:
event_id- unique per event, makes dedup possibleuser_idandanonymous_id- both, for identity stitchingsession_id- for sessionisationtimestamp- in ISO 8601, UTCschema_version- the version of the spec this event was emitted against
The schema version is the one most teams skip. Then they change a property’s meaning, ship it, and three months of dashboards quietly break.
Items arrays
Section titled “Items arrays”Ecommerce events carry an items array - the products in the cart, the products in the order, the products in a list. Two failures show up here repeatedly.
First, inconsistent item_id across events. view_item sends the SKU, purchase sends Shopify’s internal numeric product ID. The two events can’t be joined. Pick one identifier and use it from impression to refund.
Second, not clearing the items array between pushes. On client-side tag managers, a stale items from view_item_list can leak into the next purchase and make it look like the user bought twelve products instead of one. The fix is a dataLayer.push({ ecommerce: null }) between events. It feels redundant and it is the single most common ecommerce tracking bug.
Versioning
Section titled “Versioning”Schemas change. The bad pattern is silent change - same event name, different property semantics. The good pattern:
- Add new properties freely. Old consumers ignore them.
- Don’t change existing property meanings. If
totalused to include tax and now doesn’t, that’s a new property (subtotal_cents), not the same one with new behaviour. - Bump
schema_versionwhen the structure changes meaningfully. - For breaking changes, ship a new event name (
product_viewed_v2) and migrate consumers before removing the old one.
This sounds heavy. It’s lighter than retroactively fixing a year of analytics.
Where the schema lives
Section titled “Where the schema lives”The spec has to live somewhere both engineering and analytics will read. Ranked:
- Typed schema in code. TypeScript types or a JSON schema that the tracking SDK enforces. Best. Breaks the build when an event drifts.
- A specification doc with examples. Wiki page, README, Notion. Worse than typed but better than nothing.
- Nothing, just look at what’s already firing. The default state on most teams. Don’t.
The further the spec is from code, the faster it rots. The more it’s enforced at the data layer, the longer it survives organisational turnover.