Choosing a shard key: user, tenant, region, or time?
Shard on the unit most requests already own. Hash to spread load inside that unit; layer region or time when residency, lifecycle, or skew demand it—never pick a key because it fit a slide.
The shard key is not a clever hash function contest. It is the answer to: “If I could only co-locate one thing so that most reads and writes stay local, what would that thing be?”
Get that wrong and you win cross-shard joins, hot spots, and pager rotations where nobody remembers why email was the partition key. If you have ever seen Vitess or a sharded Dynamo story, you know the key is half the product contract.
1. First principle: start from ownership and access patterns
Ask:
- Who owns the row in product language? A consumer, a company account, a region, a device?
- What scope does a typical request carry?
user_id,tenant_id,country,session_id? - What must never fan out on the hot path? Checkout is different from “global search across all tenants.”
A good shard choice:
- keeps the majority of transactions on one shard (or a small, predictable set),
- spreads write load instead of funneling it,
- matches compliance and latency stories you can defend in an audit.
2. Sharding by user_id (consumer products)
Good fit when most traffic is “this user’s cart, settings, feed slice scoped to me.”
Benefits
- Simple mental model for engineers and support.
- Co-locates data you usually fetch together.
- Easy to reason about backups per cohort (with care).
Trade-offs
- Global features (social graph, cross-user search) need fan-out or secondary indices.
- Power users can still skew a shard if popularity is wildly uneven—mitigate with sub-sharding or virtual shards.
Heuristic: default for B2C when the product is user-centric and you are not primarily a multi-tenant control plane.
3. Sharding by tenant_id or account_id (B2B SaaS)
Good fit when the paying customer is a company, isolation is a selling point, and you bill by seat or usage per org.
Benefits
- Noisy neighbor containment—move or throttle one tenant without guessing which user rows bled together.
- Operational stories: export, delete-for-GDPR, per-tenant upgrades.
Trade-offs
- Elephant tenants—one Fortune 10 can dwarf the rest; you need dedicated cells, higher quotas, or sub-partitioning inside the tenant.
- Cross-tenant reporting belongs in the warehouse, not ad-hoc SQL across every shard at request time.
Patterns: hash tenant → shard; promote hot tenants to isolated pools; keep a catalog of where each tenant lives.
4. Sharding by region or geography
Good fit when data residency, latency to humans, or regulatory boundaries are first-class requirements.
Benefits
- Clear story for lawyers and customers (“EU data stays in EU”).
- Users near the region see predictable RTT.
Trade-offs
- Humans and devices travel; define what “home region” means.
- Cross-region aggregates and DR drills are expensive—plan pipelines, not midnight SQL.
Common pattern: region as the outer boundary, then hash(user_id) inside the region for even spread.
5. Sharding by time
Good fit when data is append-mostly and cold data ages out: logs, metrics, audit, click streams.
Benefits
- TTL and archival become mechanical (drop partitions).
- Queries usually filter on recent windows.
Trade-offs
- “Today” is always hot unless you add a second dimension (hash) to stir writes.
- Backfills and late events need explicit handling.
Typical pattern: date_bucket + hash(high_cardinality_id) so retention and load both behave.
6. Other dimensions: feature or domain slices
Sometimes the natural seam is bounded context, not user: orders vs billing vs inventory subsystems that do not need to share a shard map.
Use this when teams and failure domains already align with domain lines—still document cross-domain reads so nobody assumes a magical join across cells.
7. Hash, range, composite, and hierarchical keys
7.1 Hash sharding
Spreads keys pseudo-randomly across nodes—great for even write pressure when the key has high cardinality.
Weak for range scans unless the product query is always equality-shaped.
7.2 Range sharding
Keeps contiguous keys together—great for time-ordered or sorted access.
Risk: hot ranges (popular prefixes, monotonic IDs). Combine with salting or secondary sharding when needed.
7.3 Composite shard keys
Example: (region, user_id) or (tenant_id, hash(user_id)) when one dimension alone would skew or violate boundaries.
7.4 Hierarchical sharding
Outer dimension (region, tenant class) then inner hash—matches how organizations actually grow.
7.5 Practical rules
- OLTP user-ish workloads: start with hash on the primary ownership key.
- Logs / events / metrics: time bucket + hash of a high-cardinality dimension.
- Compliance / locality: hierarchy first, hash inside.
- Avoid shard keys that are volatile (email), low cardinality alone (country), or monotonic without mixing (raw
created_atonly).
Sharding is not the same thing as read replicas, but you often end up with both: partitioned primaries for write scale, replicas for read-heavy paths that still respect tenant boundaries. Sketch of that split:
Separate reads & writes
8. How to choose between user, tenant, region, or something else
Work through:
- What is the most common request scope? That is your first-class shard dimension.
- What invariants cross scopes? If “always global,” you need async projections—not bigger scatter-gather in the request path.
- What fails if one shard melts? Pick boundaries that match blast radius you can explain to finance.
- What does ops need? Tenant moves, regional failovers, per-tenant export—make those operations possible without rewriting physics.
Rules of thumb
| Product shape | First hop |
|---|---|
| Consumer app | hash(user_id) |
| B2B SaaS | hash(tenant_id) + plan for elephants |
| Logs / events | time_bucket + hash(dimension) |
| Regulated locality | region → inner hash |
9. Anti-patterns to avoid
- Sharding on emails or usernames that users can change—your mapping becomes a migration treadmill.
- Sharding on metrics you do not control (current QPS per key) instead of stable ownership.
- Monotonic-only keys for writes (naive time) without a stir—everyone piles onto the last node.
- Pretending cross-shard joins are free because the ORM still lets you type them.
10. Interview-style summary
- Name the unit of ownership the business already uses.
- Pick hash vs range vs composite from access paths, not vibes.
- Add region or time when law, latency, or lifecycle demand it.
- Call out hot spots (elephants, “today,” monotonic IDs) and how you spread them.
- Admit fan-out features and solve them with async indexes, warehouses, or product scope changes—not bigger SQL hope.
Summary table
| Shard key | Best for | Main risk | Pattern |
|---|---|---|---|
user_id | B2C product surfaces | global graphs / search | hash(user_id) |
tenant_id | multi-tenant SaaS | elephant tenants | hash(tenant_id) + isolate big |
region | residency, RTT | travelers, cross-region BI | region → inner hash |
| time + hash | telemetry, audit | hot current bucket | date_bucket + hash(id) |
Closing
The shard key is a product decision wearing a database costume.
Pick the dimension your requests already believe in, spread load inside it on purpose, and document the two queries that will hurt—because those are the ones that will page you at 3 a.m. anyway.