Event schema design

What to call the event, what to attach as a property, what to leave to joins, when to version. The decisions one level below the spec doc. Get them wrong and the data still arrives, it’s just hard to use a year later.

The funnel instrumentation note covers the programme-level view of taxonomy. This one is about what goes inside each event.

Naming

Pick a shape and apply it everywhere.

added_to_cart         // snake_case
addedToCart           // camelCase
AddedToCart           // PascalCase
cart.item.added       // namespaced

The shape matters less than the consistency. Mixed conventions in the same warehouse mean every SELECT becomes a guessing game.

A few principles that age well:

Past tense for what happened. added_to_cart, not add_to_cart. The event is a record of an action that occurred, not an imperative.
Object then action. cart_item_added reads alphabetically alongside other cart events, added_cart_item doesn’t. Once you have a hundred events to sort through, this matters.
Reserve viewed for surfaces, clicked for elements. product_viewed is the PDP. clicked_add_to_cart is the button. Don’t conflate them.

If you’re on GA4, this decision is partly made for you. GA4’s built-in ecommerce reports only populate when you use Google’s exact recommended event names - add_to_cart, begin_checkout, purchase, etc. Use those for the events GA4 cares about, then apply your own convention to everything else. Mixing schemas is annoying but cheaper than reimplementing GA4’s monetisation reports.

Properties

The right question on each property: would I rather attach this once at event time, or join it in later?

Attach at event time:

Anything that can change. Price, position, variant displayed. The state at the moment matters - joining later gives you today’s value, not the value the user saw.
Anything that distinguishes this event. Which product, which button, which CTA copy. If you’d want to filter or group by it, attach it.

Leave to joins:

Anything stable per entity. Product category, supplier, country of origin. Look it up from the product table at analysis time.
Anything derivable. Don’t attach is_mobile if you already have user_agent.

// good
track('product_viewed', {
  product_id: 'SKU-123',
  price_cents: 4999,
  currency: 'GBP',
  list_id: 'pdp_recommendations',
  list_position: 3,
})

// bad - too much, half of it joinable, money as a string
track('product_viewed', {
  product_id: 'SKU-123',
  product_name: 'Wool jumper',
  product_category: 'Knitwear',
  product_brand: 'Acme',
  price: '£49.99',
  is_in_stock: true,
  user_email: 'a@b.com',
})

Money

Store as an integer in the minor unit (price_cents: 4999), with the currency as a separate field. Never as a formatted string. Never as a float. Floats and money are an old, expensive lesson and 0.1 + 0.2 !== 0.3 is one schema decision away from your ecommerce report.

Required properties

Every event should carry a small set of properties regardless of type:

event_id - unique per event, makes dedup possible
user_id and anonymous_id - both, for identity stitching
session_id - for sessionisation
timestamp - in ISO 8601, UTC
schema_version - the version of the spec this event was emitted against

The schema version is the one most teams skip. Then they change a property’s meaning, ship it, and three months of dashboards quietly break.

Items arrays

Ecommerce events carry an items array - the products in the cart, the products in the order, the products in a list. Two failures show up here repeatedly.

First, inconsistent item_id across events. view_item sends the SKU, purchase sends Shopify’s internal numeric product ID. The two events can’t be joined. Pick one identifier and use it from impression to refund.

Second, not clearing the items array between pushes. On client-side tag managers, a stale items from view_item_list can leak into the next purchase and make it look like the user bought twelve products instead of one. The fix is a dataLayer.push({ ecommerce: null }) between events. It feels redundant and it is the single most common ecommerce tracking bug.

Versioning

Schemas change. The bad pattern is silent change - same event name, different property semantics. The good pattern:

Add new properties freely. Old consumers ignore them.
Don’t change existing property meanings. If total used to include tax and now doesn’t, that’s a new property (subtotal_cents), not the same one with new behaviour.
Bump schema_version when the structure changes meaningfully.
For breaking changes, ship a new event name (product_viewed_v2) and migrate consumers before removing the old one.

This sounds heavy. It’s lighter than retroactively fixing a year of analytics.

Where the schema lives

The spec has to live somewhere both engineering and analytics will read. Ranked:

Typed schema in code. TypeScript types or a JSON schema that the tracking SDK enforces. Best. Breaks the build when an event drifts.
A specification doc with examples. Wiki page, README, Notion. Worse than typed but better than nothing.
Nothing, just look at what’s already firing. The default state on most teams. Don’t.

The further the spec is from code, the faster it rots. The more it’s enforced at the data layer, the longer it survives organisational turnover.