All writing
·5 min read

Queue vs direct RPC: when to use which

RPC is for answers you need right now. Queues buffer load, decouple producers from consumers, and carry retries—when eventual processing is the real requirement.

Distributed SystemsMessage QueuesArchitecture

Think of it like this:

  • Direct RPC is raising your hand and waiting until the teacher answers—you do not move on until you hear the reply.
  • A queue is dropping homework in the inbox: you get a quick “received,” and grading happens later while you leave the room.

Both are legitimate. The mistake is using one when the problem shape wants the other—like using a blocking HttpClient call to Mailchimp in the middle of signup because “it is only one line of code.”

Direct RPC vs queue

Direct RPC compared to a message queueDirect RPC (REST, gRPC…)Queue (buffer + workers)request and waitCallerServiceresponseSpikes hit the service directlyack fast, process laterProducerQueuebuffers loadWorkersSpikes land in the queue firstWhen RPC fits• Caller needs the payload to continue (profile, auth, price)• Work is fast and you want one latency number• Errors should surface immediately to the caller• Simple mental model beats premature messagingWhen a queue fits• “Make sure this runs” without blocking the user path• Smooth bursts; workers scale at their own rate• Retries, DLQ, replay—reliability over instant feedback• One event → many consumers (fan-out)RPC answers “what now?” · Queues answer “what happens reliably next?”

Side-by-side

AspectDirect RPC (REST, gRPC, …)Queue (SQS, Rabbit, Kafka consumers, …)
What the caller doesSends a request and waits for a responseUsually enqueues and can return quickly (ack / accept)
Time couplingCaller and service are tied to the same momentProducer and consumer can be offset in time
Load during spikesTraffic hits the service thread / process directlyTraffic often lands in the buffer first
What you optimize forLatency + simplicity of one round tripDurability + smoothing + fan-out
Failure in your faceImmediate error to the callerRetries, poison messages, DLQ—you design for it

Use direct RPC when the answer drives the next step

If the UI or the next line of business logic cannot proceed without the payload, RPC is the honest tool.

Examples that almost always stay RPC-shaped:

  • “Load /profile so we can render the header.”
  • “Validate this session before we run the handler.”
  • “What is the price for this cart snapshot?”

Characteristics: short, predictable, one hop (or a small bounded fan-out you still treat as one user-facing latency).


Use a queue when you only need to record intent

If the user story is “make sure this happens” rather than “tell me the result before I breathe,” a queue is usually right.

Examples:

  • Welcome email after signup.
  • Thumbnail generation after upload.
  • Monthly report export.
  • Anything that might run seconds to minutes or touch rate-limited third parties.

You still owe users clarity (“we accepted your upload”), but you do not owe them a blocking HTTP call until the PDF exists.


Traffic and load

Under a spike, RPC means every extra request is another open conversation with your service until you shed load (429s, queue at the LB, autoscaling lag). That can turn into cascading timeouts fast—especially if your Tomcat connector max threads and your downstream pool size were copy-pasted from a blog post in 2019.

A queue absorbs the burst into depth. Workers drain at the rate you can sustain. The trade-off: you must monitor depth, age-of-message, and consumer lag as first-class metrics—not just CPU on the API.

Backpressure (fail fast, not slow)

Backpressure: bounded queue + rejectDo not let one dependency turn into a pile-upBound concurrency per dependency; degrade gracefully.IncomingBounded queue (max N)Worker429reject quicklyBackpressure works best with timeouts, retries only when safe, and per-dependency limits.

Errors and reliability

With RPC, “failed” usually means the caller got a 5xx or timeout and can show an error or retry once in a careful way.

With queues, failure is asynchronous: poison messages, partial consumers, duplicate delivery. You invest in idempotency, retry policies, DLQs, and replay tooling. You get resilience; you pay in operational surface area.


Fan-out and workflow style

Queues (or topics) shine when one event should drive many actions without chaining six synchronous calls in the signup request.

Example: UserSignedUp

  • RPC-heavy version: signup service calls Email, Analytics, Referral, and CRM inline—p99 becomes the sum of everyone’s bad day.
  • Queue-friendly version: signup writes the user row, publishes one event, returns. Each downstream team owns a consumer that scales independently.

If you need one-to-many reactions, events beat N RPCs in the hot path.

Visually, “fire one message, many workers pick it up later” is the same family as async offload—just with more subscribers:

Push heavy work async

Async offload: request stays lightKeep the request path boringSync: validate + commit intent. Async: do the slow stuff.Sync (request/response)ValidateCreate200Async (queue + workers)QueueWorkerEmailInvoiceAnalyticspublish eventIf the queue is unhealthy, you still want the “create” step to stay safe (idempotency + DLQ).

Simple decision cheatsheet

QuestionIf yes…Lean toward
Does the caller or user need the result right now to continue?Next screen depends on the bodyDirect RPC
Can this work run seconds or minutes later?Email, reports, exportsQueue
Is the work heavy or hits fragile external APIs?CPU, IO, rate limitsQueue
Should many independent side effects follow one action?Analytics, notifications, CRMQueue or pub/sub
Is it a small, fast read or validation?Token check, simple GETDirect RPC

In short

  • RPC: “Give me the answer so I can continue.”
  • Queue: “Make sure this gets done reliably—even if the world is loud right now.”

Closing

Neither pattern is “more senior.” RPC is default for request-shaped work. Queues are default for durable, deferred, or fan-out work.

Pick based on what the caller is allowed to wait for, not based on what sounded cool in a conference talk.