All writing
·5 min read

Protecting downstream services from traffic spikes

Rate limits, circuit breakers, load shedding, and bulkheads are how you keep a partner’s bad day from becoming your outage—without pretending dependencies are infinite.

ReliabilityScalabilityLoad Balancing

When traffic doubles, the failure mode you want is measured: latency climbs, low-priority work drops, breakers open, and humans get graphs that tell a story.

The failure mode you do not want is surprise: thread pools at zero, connection counts off the chart, and the payment API you do not own explaining your SLA to your CEO. If you have ever watched Stripe time out while your Spring service happily spun up more blocking calls, you already know why this post exists.

Protecting downstream under spikes

Downstream protection patternsRate limitingCap inbound or outbound QPSLimiterOver threshold → delay / reject / queue tokenBest for: public APIs, noisy partnersCircuit breakerFail fast when downstream is sickCallerOPENskip callFallback / cacheHalf-open probes after cool-down · metrics on stateLoad sheddingDrop low-priority work firsttrafficcriticalshedKeep checkout alive; defer analytics, enrichment, A/B hooksBulkheadIsolate pools per dependencyOutbound poolsPaymentsSearchEmailOne slow partner cannot steal every thread / connectionPlan pool sizes explicitly—no hidden shared bucketCombine patterns: limit → shed → break → isolate. Prefer fast degradation over cascade.

1. Use circuit breakers so you fail fast, not slowly

A circuit breaker tracks recent failures (timeouts, 5xx, saturation). After a threshold, it opens: you stop calling the sick dependency and return a fallback (cached value, degraded feature, clear error) instead of queueing misery behind a stuck socket.

Half-open probes after a cool-down let traffic trickle back to test recovery—without slamming the dependency the moment it twitches.

Why it matters: one slow checkout dependency should not hold every unrelated read hostage. Breakers turn “everyone waits forever” into “this path is temporarily offline, everything else breathes.”


2. Limit how many resources each dependency can consume (bulkheads)

A bulkhead is a separate pool—threads, connections, semaphores—per dependency or per capability class. Payments, search, and outbound email should not share one anonymous bucket that any of them can drain.

Example: Search gets slow during a marketing push. If search shares a pool with payment capture, you can win the SEO battle and lose the money. Split pools so search’s tantrum cannot exhaust payment’s lifelines.

This is planning work: you have to pick sizes intentionally and watch utilization, or you have decorated config with no behavior.


3. Rate limit your own outbound calls

Even polite partners will accept traffic until they stop accepting traffic. Cap QPS per dependency (token bucket, leaky bucket, whatever your library calls it) so your autoscaling enthusiasm does not become an accidental DDoS.

This pairs with retry budgets: retries multiply load. If every client retries aggressively on 503, recovery takes longer. Cap retries and add jitter.


4. Use tight timeouts and smart retries

Timeouts should reflect measured tail latency, not “infinity minus one.” If p99 to Fraud is 400ms, a 30s timeout is not kindness—it is a thread donation program.

Retries belong behind:

  • idempotency (or naturally safe operations),
  • jittered backoff,
  • and max attempt ceilings.

Otherwise you are just building a retry storm that hits the dependency during the exact window it is trying to stand back up.


5. Queue non-critical work instead of calling directly

If the user does not need the result in the same HTTP transaction, do not synchronously fan out to five internal services. Enqueue, publish, or schedule; let consumers absorb spikes on their timeline.

You already know the examples: email, analytics, enrichment, CRM sync. The pattern is the same: record intent, respond, process async.


6. Cache responses where it is safe

Not every dependency needs a live call every time. Catalog metadata, config flags, reference data—often stale-within-TTL is fine if you name the staleness budget.

Caches are not free: invalidation, key cardinality, and “thundering herd on expiry” still exist. Use them as a pressure valve, not a magic blanket.

A hit means you never woke the partner; a miss is the controlled trip out to the network:

Cache layer (cache-aside)

Cache-aside hit/missGoal: 10× traffic ≠ 10× DB loadHits are cheap. Misses are the path you must keep healthy.ServiceCacheTTLs + hot keysDatabase80% hitcache answers fast20% missfalls through to DBPractical: short TTL for hot keys · invalidate carefully · avoid stampedes

7. Combine these into a safety plan

For every important downstream dependency, you should be able to answer:

  • What is the max QPS we will ever send?
  • What timeout do we use, based on what percentile?
  • How many retries, with what backoff, and are they safe?
  • Do we have a circuit breaker (and what is the fallback)?
  • Do we isolate its pool (bulkhead) from others?
  • Do we cache anything it returns—and for how long?

If those answers are crisp, you are not hoping the dependency survives spikes—you are actively protecting it, and yourself.


Summary table

PatternMechanismBest forTrade-off
Rate limitingcap throughputpublic APIs, noisy callerslegitimate traffic can hit limits
Circuit breakerstop calling while unhealthyflaky microservices, fragile SaaSneeds fallback + observability
Load sheddingdrop low-priority under stresskeeping checkout alivesome work is intentionally lost
Bulkheadseparate poolsmixed criticality on one servicemore tuning + capacity planning

Closing

Clean degradation beats heroic uptime theater. Spikes are normal; cascading failures are optional.

Mix the tools: limit how hard you hit friends, shed what you can afford to lose, break fast when they are down, and isolate pools so one bad neighbor does not evict the whole building.