All writing
·4 min read

OLTP vs OLAP data modeling: two jobs, two shapes

Transactional systems keep the product correct and fast at the row. Warehouses explain the business across huge scans. Mixing the two jobs in one head—or one undifferentiated schema—hurts both.

DatabasesData EngineeringStorage

Think of two different systems with two different bosses:

  • OLTP answers “can this user check out right now?”
  • OLAP answers “how did revenue move last quarter?”

Same company, different latency, different access patterns, different modeling trade-offs. Pretending they are the same database personality is how you get neither fast writes nor cheap scans—like running a finance close-out query on the same Aurora writer your checkout uses because “it is all SQL.”

OLTP vs OLAP modeling

OLTP versus OLAP data modeling overviewOLTPKeeps the product alivePG · MySQL · Spanner-styleRow store: one row’s columns live togetherpoint lookups · short transactions · ACIDNormalize · constrain · index the hot pathsBad at huge scans—by designSignup · pay · cart · inventory locksOLAPExplains the businessSnowflake · BigQuery · ClickHouse-styleColumn store: scan a few columns across many rowsstar / snowflake · partitions · compressionDenormalize on purpose for read costRow-level churn is slow—batch or micro-batch insteadRevenue · cohorts · funnels · exec KPIsBridging OLTP → OLAPPick a latency budget• Batch ETL — export / transform / load (hours lag common)• ELT — land raw-ish fast, transform in the warehouse SQL• CDC — stream changes; minutes to near-real-time slicesModel separately: normalized source vs analytic starDo not merge the jobsOne engine, two masters = painOLTP wants small writes + predictable indexesOLAP wants wide reads + partition pruningRunning heavy analytics on the primary starves writersUse replicas with guardrails, or a real warehouse pathTwo systems, two schemas, two SLOs—sync expectations, not storage engines.

1. OLTP — the system that keeps the product alive

OLTP (online transaction processing) is what sits behind live traffic: signups, carts, payments, inventory reservations.

Typical engines people mean: PostgreSQL, MySQL, Cockroach-style distributed SQL, Spanner-class systems.

Characteristics that should drive modeling:

  • Low latency targets—often single-digit milliseconds for the hot path.
  • High concurrency—many small sessions doing short work at once.
  • Small result sets—“give me this user,” not “aggregate five years of clicks.”

Modeling habits:

  • Normalize to reduce update anomalies; enforce constraints.
  • Index for the queries you actually run in production—not every hypothetical join.
  • Row-oriented storage (conceptually): fetching all columns of one row is the happy path.

Where it hurts: giant ad-hoc aggregations across billions of rows. You can force it, but you are using the wrong hammer and everyone queues behind you.

Most teams first relief valve is not a warehouse—it is read replicas with guardrails so BI does not steal the buffer cache from UPDATE inventory. Same idea as back pressure, but for who gets to touch rows:

Separate reads & writes

Read/write split: primary + replicasReads scale horizontally. Writes scale… carefully.Use replicas for read-heavy paths; keep write path controlled.ServicePrimarywritesReplicaReplicaReplicawritereads (scale out)replication lag is a trade-off (read-after-write may need primary)

2. OLAP — the system that explains what is happening

OLAP (online analytical processing) powers reporting, dashboards, experimentation summaries, finance close-outs.

Examples: Snowflake, BigQuery, ClickHouse, Redshift-class warehouses.

Characteristics:

  • Large scans and aggregations across millions or billions of rows.
  • Latency tolerance measured in seconds (sometimes more) if the answer is trustworthy and cheap.
  • Column-oriented storage and heavy compression: read only the columns the query touches.

Modeling habits:

  • Denormalized star or snowflake schemas—facts plus dimensions on purpose.
  • Partition by time (and cluster/sort keys that match real filters).
  • Prefer batch or micro-batch loads over trickling row-by-row updates into the fact table.

Where it hurts: pretending it is a general-purpose OLTP replacement—mutable row churn, many small point updates, and interactive session locking are not what warehouses optimize for.


3. The time dimension

OLTP usually cares about now: current cart, current balance, current feature flag.

OLAP cares about history and drift: how cohorts moved week over week, how experiments aged, how revenue recognized.

That difference shows up in keys, slowly changing dimensions, late-arriving facts, and backfills—topics that belong in warehouse modeling, not bolted onto every OLTP table “just in case BI asks.”


4. Why one database rarely does both jobs at scale

You can run small reports on a replica or spin up an analytical extension in a pinch.

At scale, one undifferentiated schema optimized for neither side tends to produce:

  • long-running analytics queries stealing buffer cache from checkout,
  • indexing strategies that help point reads but hurt wide scans (or the reverse),
  • schema change fear because “the dashboard team” and “the payments team” share one fragile table.

Splitting workloads is not always two vendors—it can be primary + read replica with guardrails, or primary + warehouse. The point is separate SLOs and separate modeling headspace.


5. Bridging OLTP and OLAP in practice

Common bridges (pick based on how fresh the warehouse must be):

PatternShapeLatency vibe
Batch ETLextract → transform → load on a schedulehours common
ELTland raw or lightly typed fast; transform in SQL inside the warehouseflexible
CDCstream inserts/updates/deletes; rebuild analytic tables incrementallyminutes to near–real-time

Modeling implication: keep normalized source-of-truth in OLTP, rebuild analytic projections in OLAP. Trying to make one physical table serve both audiences usually ends in compromise nobody enjoys.


6. Interview-summary version

OLTP

  • Small, frequent reads/writes.
  • Normalized schema + constraints.
  • Row stores; index hot paths.
  • Optimize for correctness + predictable latency.

OLAP

  • Big scans and aggregates.
  • Denormalized star/snowflake.
  • Column stores; partition time.
  • Optimize for scan cost + analyst iteration speed.

Bridge

  • ETL / ELT / CDC move data; freshness vs complexity is the trade-off.

Closing

Two systems, two schemas, two SLOs. Let OLTP be boring and fast for users; let OLAP be wide and cheap for questions. Sync the expectations between teams—not by cramming every dimension into the transactional schema, but by owning the pipeline between them.