All writing
·5 min read

Choosing SQL, NoSQL, and search engines (without bingo cards)

Start from invariants and access patterns: relational SQL for systems of record, NewSQL when scale meets SQL semantics, NoSQL when the workload is partition-shaped, and search as a rebuildable projection—never the ledger.

DatabasesDistributed SystemsSystem Design

The worst way to pick storage is to open the resume and declare victory.

The better way: list what must stay true under concurrency, how data is read and written in production, whether you need ranked text, and what scale and cost curve you are actually on. Tool names come last. (Yes, you can keep Hibernate on Postgres for money rows and still add Redis and OpenSearch—polyglot is normal, not cheating.)

SQL · NewSQL · NoSQL · Search

Choosing SQL, NewSQL, NoSQL, or searchSQL / relationalDefault system of recordusers · orders · order_items + FK edgesusersordersitemsACID · joins · ad-hoc SQL when indexed sanelyMoney, contracts, profiles, anything “many truths at once”NewSQLSQL semantics, distributed scaleCockroach · Spanner-class · TiDB-styleKeep transactions + familiar SQL surfacePay in ops complexity: regions, leases, clock rulesWhen one primary cannot hold write load but you still want SQL truthNoSQLShape the schema around access pathsDynamo · Cassandra · Mongo-style workloadsPartition key → document / wide row · minimal joinsHuge write fan-out, known query shapes, denormalize on purposeSessions, telemetry, feature flags, IoT burstsGraph stores when the question is the walk, not the rowSearch engineProjection — not source of truthOLTP / OLAP truthInverted indexuser queries · facets · rankingRebuild from source if the index corrupts—never primary ledgerStart from invariants + access patterns—tool names come last.

1. Start from access patterns and invariants

Before you debate vendors, answer:

  • Entities and relationships — do you need foreign keys, joins, and multi-row transactions to stay honest?
  • Hot paths — point reads vs scans vs append-only streams?
  • Consistency — can reads be briefly fuzzy, or does money require serializable truth?
  • Growth — single-region OLTP today vs multi-region writes tomorrow?

If you cannot sketch the top five queries and the failure modes you refuse, you are not ready to choose a database family—you are ready to argue on Twitter.


2. When SQL is the right default

Relational engines (PostgreSQL, MySQL, SQL Server, and friends) are still the default system of record for a reason:

  • ACID transactions across rows you can point at in an ER diagram.
  • Constraints that stop garbage from becoming historical fact.
  • Ad-hoc SQL when indexes and query plans match reality.

Heuristics: ledgers, orders, billing contracts, anything where “two people disagree about the balance” is an incident, not a feature.


3. The NewSQL bridge

NewSQL-shaped systems (CockroachDB, Spanner-class, TiDB-style) try to keep SQL + strong semantics while adding horizontal scale and multi-region storylines.

You still pay for it: operational complexity, latency budgets, sometimes clock and contention rules you must actually read.

Reach here when one primary cannot hold write traffic but you are not ready to throw away multi-statement transactions as a design tool.


4. When NoSQL makes sense (and which flavor)

Key–value and document stores

Good fit when access is “partition key + sort key” shaped, joins are rare, and you can denormalize without feeling dirty.

Benefits: huge write fan-out, simple horizontal scaling story for the right key design.

Costs: every “just add a join” request becomes a redesign; migrations need discipline.

Wide-column stores

Good fit when you ingest massive append-only streams and query predictable column families.

Benefits: throughput and retention economics at silly scale.

Costs: you own compaction, repair, and access patterns up front.

Graph databases

Good fit when the product question is literally the walk (fraud rings, social distance, permissions on a DAG)—not “I heard graphs are cool.”

Costs: smaller ops surface in many orgs; still need backup, upgrades, and query literacy.


5. When to bring in a search engine

Elasticsearch / OpenSearch-class systems excel at full text, fuzzy match, facets, and ranking over large corpora.

Good fit when users type words into a box and expect relevance, not just WHERE title LIKE '%foo%'.

Benefits: fast retrieval for search-shaped reads; decouples relevance iteration from OLTP schema churn.

Costs: eventual consistency, cluster feeding, reindex jobs, and someone owning relevance tuning.

Practical guideline: treat search as a secondary index you can rebuild from PostgreSQL (or the warehouse), not the place you store balances.


6. Typical composition in a real system

Polyglot persistence is normal, not shameful. The OLTP vs OLAP split is the same “two jobs” story in another costume—your app database is not where you want five-year cohort analysis living:

OLTP vs OLAP modeling

OLTP versus OLAP data modeling overviewOLTPKeeps the product alivePG · MySQL · Spanner-styleRow store: one row’s columns live togetherpoint lookups · short transactions · ACIDNormalize · constrain · index the hot pathsBad at huge scans—by designSignup · pay · cart · inventory locksOLAPExplains the businessSnowflake · BigQuery · ClickHouse-styleColumn store: scan a few columns across many rowsstar / snowflake · partitions · compressionDenormalize on purpose for read costRow-level churn is slow—batch or micro-batch insteadRevenue · cohorts · funnels · exec KPIsBridging OLTP → OLAPPick a latency budget• Batch ETL — export / transform / load (hours lag common)• ELT — land raw-ish fast, transform in the warehouse SQL• CDC — stream changes; minutes to near-real-time slicesModel separately: normalized source vs analytic starDo not merge the jobsOne engine, two masters = painOLTP wants small writes + predictable indexesOLAP wants wide reads + partition pruningRunning heavy analytics on the primary starves writersUse replicas with guardrails, or a real warehouse pathTwo systems, two schemas, two SLOs—sync expectations, not storage engines.
  • Relational (or NewSQL) for money-shaped truth.
  • Redis / Memcached (cache, not durability promises unless you mean it).
  • Kafka / Pulsar / SQS (streams and hand-offs—not a substitute for queryability).
  • Warehouse for analytics (see OLTP vs OLAP data modeling).
  • Search for user-facing relevance.

The architecture is boring; the boundaries are where seniority shows up.


7. A simple decision matrix

AxisLean relationalLean NoSQLAdd search
Consistency / multi-row invariantsmust holdcan relax with compensationsnot its job
Relationships & ad-hoc joinsheavylight by designinverted only
Query patternchanging analytics on normalized datafixed access pathsranked text + facets
Scale / costvertical + replicas firsthorizontal partition mathindex size + reindex cost

8. Common anti-patterns

  • Elasticsearch as primary store for money because “it is fast.” Fast lies are still lies.
  • Mongo for everything because “schema is flexible.” Flexible becomes implicit schema in five services nobody documents.
  • Graph DB for CRUD because the diagram looked pretty in the keynote.
  • Skipping migrations on any system that outlives the prototype weekend.

9. Interview-style summary

  • Default SQL for systems of record with real constraints.
  • Consider NewSQL when horizontal scale and SQL semantics both show up on the same ticket.
  • Pick NoSQL when access paths are partition-known and denormalization is a product decision, not an accident.
  • Add search when relevance and token matching are the product; keep rebuild paths from truth.
  • Real systems mix pieces—own the boundaries and the failure modes.

Closing

Names on boxes matter less than invariants + access paths + who reboots the cluster at 3 a.m.

If you can explain those three, you can defend the storage map in design review—and change it later without pretending the old choice was “obviously wrong.”