Protecting downstream services from traffic spikes
Rate limits, circuit breakers, load shedding, and bulkheads are how you keep a partner’s bad day from becoming your outage—without pretending dependencies are infinite.
When traffic doubles, the failure mode you want is measured: latency climbs, low-priority work drops, breakers open, and humans get graphs that tell a story.
The failure mode you do not want is surprise: thread pools at zero, connection counts off the chart, and the payment API you do not own explaining your SLA to your CEO. If you have ever watched Stripe time out while your Spring service happily spun up more blocking calls, you already know why this post exists.
Protecting downstream under spikes
1. Use circuit breakers so you fail fast, not slowly
A circuit breaker tracks recent failures (timeouts, 5xx, saturation). After a threshold, it opens: you stop calling the sick dependency and return a fallback (cached value, degraded feature, clear error) instead of queueing misery behind a stuck socket.
Half-open probes after a cool-down let traffic trickle back to test recovery—without slamming the dependency the moment it twitches.
Why it matters: one slow checkout dependency should not hold every unrelated read hostage. Breakers turn “everyone waits forever” into “this path is temporarily offline, everything else breathes.”
2. Limit how many resources each dependency can consume (bulkheads)
A bulkhead is a separate pool—threads, connections, semaphores—per dependency or per capability class. Payments, search, and outbound email should not share one anonymous bucket that any of them can drain.
Example: Search gets slow during a marketing push. If search shares a pool with payment capture, you can win the SEO battle and lose the money. Split pools so search’s tantrum cannot exhaust payment’s lifelines.
This is planning work: you have to pick sizes intentionally and watch utilization, or you have decorated config with no behavior.
3. Rate limit your own outbound calls
Even polite partners will accept traffic until they stop accepting traffic. Cap QPS per dependency (token bucket, leaky bucket, whatever your library calls it) so your autoscaling enthusiasm does not become an accidental DDoS.
This pairs with retry budgets: retries multiply load. If every client retries aggressively on 503, recovery takes longer. Cap retries and add jitter.
4. Use tight timeouts and smart retries
Timeouts should reflect measured tail latency, not “infinity minus one.” If p99 to Fraud is 400ms, a 30s timeout is not kindness—it is a thread donation program.
Retries belong behind:
- idempotency (or naturally safe operations),
- jittered backoff,
- and max attempt ceilings.
Otherwise you are just building a retry storm that hits the dependency during the exact window it is trying to stand back up.
5. Queue non-critical work instead of calling directly
If the user does not need the result in the same HTTP transaction, do not synchronously fan out to five internal services. Enqueue, publish, or schedule; let consumers absorb spikes on their timeline.
You already know the examples: email, analytics, enrichment, CRM sync. The pattern is the same: record intent, respond, process async.
6. Cache responses where it is safe
Not every dependency needs a live call every time. Catalog metadata, config flags, reference data—often stale-within-TTL is fine if you name the staleness budget.
Caches are not free: invalidation, key cardinality, and “thundering herd on expiry” still exist. Use them as a pressure valve, not a magic blanket.
A hit means you never woke the partner; a miss is the controlled trip out to the network:
Cache layer (cache-aside)
7. Combine these into a safety plan
For every important downstream dependency, you should be able to answer:
- What is the max QPS we will ever send?
- What timeout do we use, based on what percentile?
- How many retries, with what backoff, and are they safe?
- Do we have a circuit breaker (and what is the fallback)?
- Do we isolate its pool (bulkhead) from others?
- Do we cache anything it returns—and for how long?
If those answers are crisp, you are not hoping the dependency survives spikes—you are actively protecting it, and yourself.
Summary table
| Pattern | Mechanism | Best for | Trade-off |
|---|---|---|---|
| Rate limiting | cap throughput | public APIs, noisy callers | legitimate traffic can hit limits |
| Circuit breaker | stop calling while unhealthy | flaky microservices, fragile SaaS | needs fallback + observability |
| Load shedding | drop low-priority under stress | keeping checkout alive | some work is intentionally lost |
| Bulkhead | separate pools | mixed criticality on one service | more tuning + capacity planning |
Closing
Clean degradation beats heroic uptime theater. Spikes are normal; cascading failures are optional.
Mix the tools: limit how hard you hit friends, shed what you can afford to lose, break fast when they are down, and isolate pools so one bad neighbor does not evict the whole building.