Choosing SQL, NoSQL, and search engines (without bingo cards)
Start from invariants and access patterns: relational SQL for systems of record, NewSQL when scale meets SQL semantics, NoSQL when the workload is partition-shaped, and search as a rebuildable projection—never the ledger.
The worst way to pick storage is to open the resume and declare victory.
The better way: list what must stay true under concurrency, how data is read and written in production, whether you need ranked text, and what scale and cost curve you are actually on. Tool names come last. (Yes, you can keep Hibernate on Postgres for money rows and still add Redis and OpenSearch—polyglot is normal, not cheating.)
SQL · NewSQL · NoSQL · Search
1. Start from access patterns and invariants
Before you debate vendors, answer:
- Entities and relationships — do you need foreign keys, joins, and multi-row transactions to stay honest?
- Hot paths — point reads vs scans vs append-only streams?
- Consistency — can reads be briefly fuzzy, or does money require serializable truth?
- Growth — single-region OLTP today vs multi-region writes tomorrow?
If you cannot sketch the top five queries and the failure modes you refuse, you are not ready to choose a database family—you are ready to argue on Twitter.
2. When SQL is the right default
Relational engines (PostgreSQL, MySQL, SQL Server, and friends) are still the default system of record for a reason:
- ACID transactions across rows you can point at in an ER diagram.
- Constraints that stop garbage from becoming historical fact.
- Ad-hoc SQL when indexes and query plans match reality.
Heuristics: ledgers, orders, billing contracts, anything where “two people disagree about the balance” is an incident, not a feature.
3. The NewSQL bridge
NewSQL-shaped systems (CockroachDB, Spanner-class, TiDB-style) try to keep SQL + strong semantics while adding horizontal scale and multi-region storylines.
You still pay for it: operational complexity, latency budgets, sometimes clock and contention rules you must actually read.
Reach here when one primary cannot hold write traffic but you are not ready to throw away multi-statement transactions as a design tool.
4. When NoSQL makes sense (and which flavor)
Key–value and document stores
Good fit when access is “partition key + sort key” shaped, joins are rare, and you can denormalize without feeling dirty.
Benefits: huge write fan-out, simple horizontal scaling story for the right key design.
Costs: every “just add a join” request becomes a redesign; migrations need discipline.
Wide-column stores
Good fit when you ingest massive append-only streams and query predictable column families.
Benefits: throughput and retention economics at silly scale.
Costs: you own compaction, repair, and access patterns up front.
Graph databases
Good fit when the product question is literally the walk (fraud rings, social distance, permissions on a DAG)—not “I heard graphs are cool.”
Costs: smaller ops surface in many orgs; still need backup, upgrades, and query literacy.
5. When to bring in a search engine
Elasticsearch / OpenSearch-class systems excel at full text, fuzzy match, facets, and ranking over large corpora.
Good fit when users type words into a box and expect relevance, not just WHERE title LIKE '%foo%'.
Benefits: fast retrieval for search-shaped reads; decouples relevance iteration from OLTP schema churn.
Costs: eventual consistency, cluster feeding, reindex jobs, and someone owning relevance tuning.
Practical guideline: treat search as a secondary index you can rebuild from PostgreSQL (or the warehouse), not the place you store balances.
6. Typical composition in a real system
Polyglot persistence is normal, not shameful. The OLTP vs OLAP split is the same “two jobs” story in another costume—your app database is not where you want five-year cohort analysis living:
OLTP vs OLAP modeling
- Relational (or NewSQL) for money-shaped truth.
- Redis / Memcached (cache, not durability promises unless you mean it).
- Kafka / Pulsar / SQS (streams and hand-offs—not a substitute for queryability).
- Warehouse for analytics (see OLTP vs OLAP data modeling).
- Search for user-facing relevance.
The architecture is boring; the boundaries are where seniority shows up.
7. A simple decision matrix
| Axis | Lean relational | Lean NoSQL | Add search |
|---|---|---|---|
| Consistency / multi-row invariants | must hold | can relax with compensations | not its job |
| Relationships & ad-hoc joins | heavy | light by design | inverted only |
| Query pattern | changing analytics on normalized data | fixed access paths | ranked text + facets |
| Scale / cost | vertical + replicas first | horizontal partition math | index size + reindex cost |
8. Common anti-patterns
- Elasticsearch as primary store for money because “it is fast.” Fast lies are still lies.
- Mongo for everything because “schema is flexible.” Flexible becomes implicit schema in five services nobody documents.
- Graph DB for CRUD because the diagram looked pretty in the keynote.
- Skipping migrations on any system that outlives the prototype weekend.
9. Interview-style summary
- Default SQL for systems of record with real constraints.
- Consider NewSQL when horizontal scale and SQL semantics both show up on the same ticket.
- Pick NoSQL when access paths are partition-known and denormalization is a product decision, not an accident.
- Add search when relevance and token matching are the product; keep rebuild paths from truth.
- Real systems mix pieces—own the boundaries and the failure modes.
Closing
Names on boxes matter less than invariants + access paths + who reboots the cluster at 3 a.m.
If you can explain those three, you can defend the storage map in design review—and change it later without pretending the old choice was “obviously wrong.”