The Data Warehouse: Every Signal Your Brand Has Ever Made

Why Memory Comes Before Intelligence

Ask an AI to write your brand's email copy. It'll produce something competent and generic. Ask it again tomorrow. Different output, same generic quality. There's no continuity. No memory. No accumulated understanding.

This is the fundamental problem with prompt-based AI workflows: the AI has no memory of your brand.

The Data Warehouse solves this. It's Step 3 of the Context-First methodology — the private, structured memory that captures every signal your brand has ever produced. Not a CRM export. Not a spreadsheet of metrics. A comprehensive, queryable archive of every interaction between your brand and the world.

What Goes Into the Warehouse

Data Warehouse · Step 3

Every Signal Your Brand Has Ever Made

The warehouse is the memory of your brand — structured, versioned, queryable.

⬤

Commerce DataOrders + Products

Shopify orders, line items, customer records, product catalog, inventory, metafields

⬤

Marketing DataCampaigns + Email

Meta Ads performance, Klaviyo campaigns, Google Ads, audience segments, creative assets

⬤

Customer VoiceReviews + Support

Product reviews, sentiment analysis, support tickets, social mentions, survey data

⬤

Brand ArtifactsHistory

Photography, copy, campaign archives, competitive analysis, seasonal calendars, reference docs

The warehouse consumes data from every platform in the Integration Layer (Step 2). For Celtic Knot Jewellery, that means:

Customer Signals

907 product reviews with sentiment analysis
Customer support conversations and resolution patterns
Repeat purchase frequency and lifetime value segmentation
Geographic and demographic distribution

Transaction Data

Every order: products, quantities, discounts, shipping
Payment methods and failure patterns
Refund rates by product category
Seasonal purchase curves across 6 years of history

Marketing Performance

Meta ad campaigns: impressions, clicks, conversions, ROAS by creative
Klaviyo email sequences: open rates, click rates, revenue attribution
Google Ads: search terms, quality scores, conversion paths
Organic traffic patterns and content engagement

Product Intelligence

Full product catalog with attributes, variants, and pricing history
Collection structures and merchandising rules
Inventory velocity and restock patterns
Cross-sell and upsell affinity data

Brand Surface Data

Website analytics: pageviews, scroll depth, exit rates per page
Social media engagement: likes, comments, shares, saves by platform
SMS campaign performance
Call logs and voicemail transcriptions

Every signal. Every surface. Every customer. Every product. This is the raw material from which intelligence is built.

The Schema

The warehouse isn't a data dump. It's a structured PostgreSQL database designed for machine consumption. The schema follows a dimensional model:

Fact Tables — events that happened

fact_orders — every transaction with full line-item detail
fact_campaigns — marketing campaign performance snapshots
fact_interactions — customer touchpoints across channels
fact_reviews — product reviews with computed sentiment scores

Dimension Tables — entities that exist

dim_customers — unified customer profiles with identity resolution
dim_products — product catalog with attribute taxonomy
dim_channels — marketing channels with performance baselines
dim_time — date dimension for temporal analysis

Bridge Tables — relationships between entities

bridge_customer_segments — segment membership (can be multiple)
bridge_product_collections — collection hierarchy
bridge_attribution — multi-touch attribution paths

This schema supports both analytical queries ("What's our ROAS by audience segment over the last 90 days?") and generative queries ("Give me the 5 most common complaint themes from customers who purchased cashmere products").

Identity Resolution

The hardest problem in the warehouse is identity resolution — connecting a single customer across all their touchpoints.

The same person might be:

jane@gmail.com in Klaviyo
Customer #8472 in Shopify
A Meta pixel event with a hashed email
A Google Analytics session with a client ID
A product review signed "Jane D."

Our identity resolution pipeline uses progressive matching:

Deterministic Match: Email address or phone number exact match
Probabilistic Match: Name + zip code + purchase date correlation
Behavioral Match: Device fingerprinting and session continuity
Manual Override: Flagged duplicates resolved by rules

The result is a unified customer profile that knows Jane bought a Claddagh ring in March, opened 12 of the last 15 emails, left a 5-star review mentioning "my grandmother's heritage," and has a lifetime value of $1,247 across 6 orders over 4 years.

When an AI agent reads Jane's profile, it doesn't see a row in a database. It sees a person with a story, preferences, and a relationship arc with the brand. That's the difference between sending "Dear Customer" and sending "Jane, the new Aran collection just arrived — and we think you'll love the heritage connection."

Seed Files and Local Development

The production warehouse lives in Supabase. But AI agents don't develop against production. They develop against seed files — portable snapshots of the warehouse that can be loaded into any local development environment.

The seeding process:

Export: CLI scripts dump each table to typed JSON files
Anonymize: Customer PII is hashed or replaced with synthetic data
Compress: Seed files are packaged for fast import
Version: Each seed set is tagged with a timestamp and schema version

This means any developer (or AI agent) can spin up a complete, realistic data environment in under 60 seconds. No production credentials needed. No risk of accidental data modification. Full fidelity for development and testing.

Why Not Just Use the APIs Directly?

A reasonable question: why build a warehouse at all? Why not let the BIOS generation process (Step 4) read directly from Shopify, Klaviyo, and Meta APIs?

Three reasons:

1. Rate Limits: Generating 33 BIOS specs requires thousands of data points. Hitting Shopify's API rate limit mid-generation breaks the entire pipeline. The warehouse serves data at local speed with zero rate limit risk.

2. Temporal Consistency: APIs return current state. The warehouse captures state over time. When analyzing purchase patterns, you need "orders in Q4 2025" — not "current order list minus cancellations."

3. Cross-Platform Joins: No single API can answer "What's the email open rate for customers who left 5-star reviews on products they discovered through Meta ads?" The warehouse can, because it holds all three data sets in a unified schema.

The Living Memory

The warehouse isn't a one-time snapshot. It's a living memory that grows continuously:

Daily: Automated Vercel cron jobs pull new orders, campaign metrics, and review data
Weekly: Identity resolution pipeline runs to unify new customer profiles
Monthly: Full data reconciliation against platform source-of-truth
Quarterly: Schema evolution to capture new signal types

Each update makes the BIOS more accurate. Each new data point refines the customer archetypes, sharpens the competitive positioning, and calibrates the pricing strategy. The warehouse is the heartbeat of the entire Context-First system.

What This Enables

With a complete warehouse, the BIOS generation process (Step 4) doesn't hallucinate brand intelligence from thin air. It extracts it from evidence.

When the BIOS says "Celtic Knot's primary customer archetype is a second-generation Irish-American woman aged 35-55 who purchases heritage-connected jewelry for milestone occasions" — that's not a persona workshop guess. That's a statistical extraction from 6 years of transaction data, review sentiment, and engagement patterns.

When the BIOS says "the optimal price point for cashmere accessories is $89-129 with a heritage narrative" — that's not a pricing consultant's instinct. That's a margin analysis across 12,000 orders with demand elasticity curves.

The Data Warehouse is what makes Context-First AI Development different from "we asked ChatGPT to write brand guidelines." Every claim is evidence-backed. Every constraint is data-derived. Every agent inherits real memory, not synthetic assumptions.

That's why Step 3 can't be skipped, and why it comes before BIOS generation. Intelligence without memory is hallucination.