Writing
Engineering Notes.
I like to talk about systems—how to build them in the right way for your needs. There isn’t a one-size-fits-all design; the best solutions are shaped by constraints, scale, and what you actually need to optimize.
Choosing a shard key: user, tenant, region, or time?
Shard on the unit most requests already own. Hash to spread load inside that unit; layer region or time when residency, lifecycle, or skew demand it—never pick a key because it fit a slide.
Choosing SQL, NoSQL, and search engines (without bingo cards)
Start from invariants and access patterns: relational SQL for systems of record, NewSQL when scale meets SQL semantics, NoSQL when the workload is partition-shaped, and search as a rebuildable projection—never the ledger.
OLTP vs OLAP data modeling: two jobs, two shapes
Transactional systems keep the product correct and fast at the row. Warehouses explain the business across huge scans. Mixing the two jobs in one head—or one undifferentiated schema—hurts both.
Idempotency for write APIs: surviving retries without duplicate harm
Treat duplicate delivery as normal: idempotency keys for money paths, resource-oriented verbs where they fit, and idempotent consumers when messages are at-least-once.
API versioning and evolution: change without breaking clients
Prefer additive contracts and tolerant parsers. When you must break, version explicitly, expand-then-contract, and run deprecations like a product launch—not a surprise outage.
Protecting downstream services from traffic spikes
Rate limits, circuit breakers, load shedding, and bulkheads are how you keep a partner’s bad day from becoming your outage—without pretending dependencies are infinite.
Backpressure in high-traffic systems: fail fast, do not hang
When load spikes, the winning move is to slow the right things deliberately—pull models, bounded queues, shedding, worker caps, outbound isolation, and clients that back off—instead of letting memory and threads die together.
Queue vs direct RPC: when to use which
RPC is for answers you need right now. Queues buffer load, decouple producers from consumers, and carry retries—when eventual processing is the real requirement.
Synchronous vs asynchronous communication: how to choose
Request–response keeps things simple when the caller must wait. Queues and events buy decoupling and resilience when work can happen later—at the cost of operational complexity.
How a request actually moves through a production system
A grounded tour from DNS and the edge through services, data, async work, and back to the browser—with observability and failure modes you can recognize in the wild.
Design for 10× traffic (without rewriting everything)
Statelessness, caching, async paths, backpressure, and runtime knobs — the boring patterns that usually carry you through a 10× spike.
Vertical vs horizontal scaling: trade-offs that actually matter
Scale up is fast and simple, scale out is resilient and elastic. The right choice depends on growth shape, failure tolerance, and team maturity.
When should you split a monolith into services?
A practical take on when a monolith stops being the right shape—told through merge queues, metrics, and one team that split for the wrong reason.
Building Scalable Backend Systems
A comprehensive guide to designing systems that can grow with your business needs.
Understanding Distributed Transactions
Deep dive into distributed transactions, trade-offs, and practical patterns.
Database Indexing Strategies
Learn how to design indexes that dramatically improve query performance without bloating storage.