Queue vs direct RPC: when to use which
RPC is for answers you need right now. Queues buffer load, decouple producers from consumers, and carry retries—when eventual processing is the real requirement.
Think of it like this:
- Direct RPC is raising your hand and waiting until the teacher answers—you do not move on until you hear the reply.
- A queue is dropping homework in the inbox: you get a quick “received,” and grading happens later while you leave the room.
Both are legitimate. The mistake is using one when the problem shape wants the other—like using a blocking HttpClient call to Mailchimp in the middle of signup because “it is only one line of code.”
Direct RPC vs queue
Side-by-side
| Aspect | Direct RPC (REST, gRPC, …) | Queue (SQS, Rabbit, Kafka consumers, …) |
|---|---|---|
| What the caller does | Sends a request and waits for a response | Usually enqueues and can return quickly (ack / accept) |
| Time coupling | Caller and service are tied to the same moment | Producer and consumer can be offset in time |
| Load during spikes | Traffic hits the service thread / process directly | Traffic often lands in the buffer first |
| What you optimize for | Latency + simplicity of one round trip | Durability + smoothing + fan-out |
| Failure in your face | Immediate error to the caller | Retries, poison messages, DLQ—you design for it |
Use direct RPC when the answer drives the next step
If the UI or the next line of business logic cannot proceed without the payload, RPC is the honest tool.
Examples that almost always stay RPC-shaped:
- “Load
/profileso we can render the header.” - “Validate this session before we run the handler.”
- “What is the price for this cart snapshot?”
Characteristics: short, predictable, one hop (or a small bounded fan-out you still treat as one user-facing latency).
Use a queue when you only need to record intent
If the user story is “make sure this happens” rather than “tell me the result before I breathe,” a queue is usually right.
Examples:
- Welcome email after signup.
- Thumbnail generation after upload.
- Monthly report export.
- Anything that might run seconds to minutes or touch rate-limited third parties.
You still owe users clarity (“we accepted your upload”), but you do not owe them a blocking HTTP call until the PDF exists.
Traffic and load
Under a spike, RPC means every extra request is another open conversation with your service until you shed load (429s, queue at the LB, autoscaling lag). That can turn into cascading timeouts fast—especially if your Tomcat connector max threads and your downstream pool size were copy-pasted from a blog post in 2019.
A queue absorbs the burst into depth. Workers drain at the rate you can sustain. The trade-off: you must monitor depth, age-of-message, and consumer lag as first-class metrics—not just CPU on the API.
Backpressure (fail fast, not slow)
Errors and reliability
With RPC, “failed” usually means the caller got a 5xx or timeout and can show an error or retry once in a careful way.
With queues, failure is asynchronous: poison messages, partial consumers, duplicate delivery. You invest in idempotency, retry policies, DLQs, and replay tooling. You get resilience; you pay in operational surface area.
Fan-out and workflow style
Queues (or topics) shine when one event should drive many actions without chaining six synchronous calls in the signup request.
Example: UserSignedUp
- RPC-heavy version: signup service calls Email, Analytics, Referral, and CRM inline—p99 becomes the sum of everyone’s bad day.
- Queue-friendly version: signup writes the user row, publishes one event, returns. Each downstream team owns a consumer that scales independently.
If you need one-to-many reactions, events beat N RPCs in the hot path.
Visually, “fire one message, many workers pick it up later” is the same family as async offload—just with more subscribers:
Push heavy work async
Simple decision cheatsheet
| Question | If yes… | Lean toward |
|---|---|---|
| Does the caller or user need the result right now to continue? | Next screen depends on the body | Direct RPC |
| Can this work run seconds or minutes later? | Email, reports, exports | Queue |
| Is the work heavy or hits fragile external APIs? | CPU, IO, rate limits | Queue |
| Should many independent side effects follow one action? | Analytics, notifications, CRM | Queue or pub/sub |
| Is it a small, fast read or validation? | Token check, simple GET | Direct RPC |
In short
- RPC: “Give me the answer so I can continue.”
- Queue: “Make sure this gets done reliably—even if the world is loud right now.”
Closing
Neither pattern is “more senior.” RPC is default for request-shaped work. Queues are default for durable, deferred, or fan-out work.
Pick based on what the caller is allowed to wait for, not based on what sounded cool in a conference talk.