How a request actually moves through a production system
A grounded tour from DNS and the edge through services, data, async work, and back to the browser—with observability and failure modes you can recognize in the wild.
When a page loads, the user sees pixels. Underneath, you get a relay race: name resolution, caching, TLS, routing, business rules, storage, third parties, background work, and telemetry—often in milliseconds, sometimes in seconds that feel like minutes.
This post is a mental model, not a vendor diagram. Real stacks reorder steps (BFFs, service mesh, edge functions). Still, if you learn one default story, you can diff your employer’s architecture against it instead of memorizing acronyms in isolation.
One way to read the pipe (northbound request)
If that ribbon already feels like alphabet soup, try the same story as a city walk—then swap the labels back to infra in your head.
City map ≈ request map (teaching analogy)
Resolution and the public internet (DNS, CDN)
Your laptop does not magically know where example.com lives. DNS turns a name into addresses (and hands you TTLs that explain “why did that cutover take five minutes for some people?”). Misconfigured records or slow resolvers show up as “the site is flaky” before your Java heap is even in the picture.
CDNs sit close to users and cache static stuff—images, fonts, script bundles—so repeat views do not hammer your origin for every byte. Example: A designer ships a new hero image with a six-hour cache header by mistake. Support hears “half the world still sees the old banner.” That is not your service layer debugging session—it is cache semantics.
The real front door: load balancer and edge
Past DNS/CDN, traffic usually hits a load balancer (or equivalent edge). It terminates TLS, picks healthy instances, and may enforce coarse policy (IP reputation, basic WAF rules). Pros: one stable IP/DNS target for customers, horizontal scale behind it. Cons: misconfigured health checks (“503 party”) or a fat edge config that becomes its own deployment risk.
Example: During a deploy, new instances boot cold and fail readiness for forty seconds. The LB keeps draining old nodes; capacity drops; p99 spikes. The bug is not “slow SQL”—it is capacity math at the edge.
API gateway: thin, strict, and loud when wrong
The gateway is where the system becomes application-aware: validate tokens or API keys, map /v2/checkout to the right upstream, enforce rate limits, maybe attach a trace id. Good gateways fail fast on garbage—wrong audience on a JWT, oversize body, missing header—so you do not waste CPU deep in a service.
They add latency, but usually less than the cost of letting junk into your core. Think of it as a bouncer: not there to run the nightclub, just to stop obvious trouble at the door.
Inside the service: where “the app” actually runs
Once a request reaches your service (HTTP, gRPC, whatever), the interesting work is still layered: server plumbing, shared middleware (logging, metrics, tracing, auth hooks), then the use case—validate input, apply rules, call collaborators, map domain objects to a response DTO.
Inside one service instance (schematic)
Example: Checkout calls Inventory over HTTP with no timeout. Inventory hiccups for two seconds; thread pools fill; unrelated health endpoints time out. The fix is not “more microservices”—it is timeouts, bounded concurrency, and circuit breakers on outbound calls, plus keeping synchronous chains shallow.
Stateless replicas behind the LB let you scale out, but they also mean every request carries its context (tokens, headers, ids)—there is no magic “server memory” between calls unless you add it (session store, sticky sessions—trade-offs abound).
Caches, databases, and the lies we tell about consistency
Most read-heavy paths try cache first (Redis, Memcached, etc.) with a key discipline (user:123, cart:abc) and TTLs that match how stale data can be. On miss, you read through to the database—often relational for money-shaped data, sometimes document/blob stores for attachments or logs.
Read replicas buy read scale but introduce replication lag—“I just wrote, why does the UI flash the old value?” Eventual consistency is a feature until it is a bug; you name which reads are allowed to be fuzzy.
External systems: the slow kids on the playground
Payment gateways, SMS vendors, shipping APIs—out-of-process and not on your SLA. They dominate tail latency when something goes wrong.
Wrap them like you mean it: deadlines, limited retries with jitter, idempotency keys for anything that charges money, and user-visible fallbacks (“payments are busy, try again”) instead of a spinner that outlasts everyone’s patience.
Example: Food-ordering app calls a card processor. Processor returns 503 twice, then succeeds on retry. Without idempotency, you might double-charge; with it, you log a boring success instead of a finance fire drill.
Async paths: keep the HTTP response boring
Not everything belongs in the critical path. After the core transaction commits, you might publish OrderPlaced to Kafka or enqueue send-receipt-email on SQS. Workers pick that up, retry safely, and your API thread goes home on time.
Example: Generating a PDF invoice takes eight seconds. Doing it inline makes your gateway timeout at five seconds look like a “random” 504. Push the work to a queue; return 202 or a small “processing” payload; let the worker phone the customer when ready.
The observability layer that wraps everything
For this whole journey to be debuggable, three things have to exist in production—not slid into next quarter’s roadmap.
Same journey, three ways to watch it
1. Logs (the paper trail, but searchable)
Structured logs at the gateway, inside services, and in workers mean you can filter instead of greping hope. Each line should carry enough context to answer “who, what, which request”—typically a trace id, a user or tenant id when it is safe, and business keys like order_id when you are debugging checkout drama.
Example: Payments returns a vague decline_code. Support pastes an order id into your log UI; you jump straight to the five lines around the charge attempt instead of reading someone’s terminal scrollback.
The boring rule people still break: never log secrets or raw PII “temporarily.” Temporary is forever in log retention.
2. Metrics (the pulse of traffic and the machine)
You want RED-style signals per route (rate, errors, latency) at the edge and inside services, plus the boring infra stuff that actually kills you: CPU, memory, DB pool usage, queue depth. Mix in a few business counters—orders placed, failed captures, signups per minute—so product panic and infra panic show up on the same timeline when Black Friday happens.
Example: Error rate is flat but checkout latency climbs. DB metrics show connection acquisition time spiking; you find a pool mis-sized after a traffic shift—not a mystery “Java got slow.”
3. Traces (the replay button for latency)
Generate a trace id at the edge, then propagate it on outbound calls (headers, baggage, whatever your stack supports). Every downstream hop becomes a span with timing and tags so a waterfall answers the only question that matters at 2 a.m.: which edge on the graph ate the second?
If you only have logs without correlation, you get archaeology. If you only have dashboards without a trace exemplar, you get vibes. If you have traces but nobody sends the id across async boundaries, you get two half-stories that never meet.
Walking back out: responses use the same doors
Responses serialize (JSON, protobuf), pick status codes and cache headers, then ride gateway → LB → TLS → client. The browser or app parses, updates UI, maybe caches locally. When debugging “slow UI,” confirm whether the network waterfall shows waiting (TTFB) vs download vs main-thread JavaScript—different villains.
Failure modes worth memorizing (tiny cheat sheet)
| Where it hurts | What it often looks like | First sane question |
|---|---|---|
| DNS / CDN | Wrong content, stale assets, geo weirdness | TTL? Purge? Wrong CNAME? |
| LB / edge | 502/503 bursts during deploys | Health checks, capacity, draining |
| Gateway | Auth works locally, fails in prod | Clock skew, wrong issuer, body limits |
| Service | Thread starvation, GC pauses | Sync fan-out, pool sizes, timeouts |
| Cache / DB | Thundering herd, replica lag | Key design, TTL, read-after-write |
| External | p99 spikes, duplicate charges | Retries + idempotency + deadlines |
| Async | “Lost” events, poison messages | DLQ, retention, consumer lag |
| Observability | Blind incidents, noisy paging | Trace id end-to-end, structured logs, SLOs |
Closing
You do not need to tattoo this exact pipeline on your arm. You do need a default story so that, when production misbehaves, you can ask “which hop is lying?” and zoom there first—instead of staring at the framework that was innocent the whole time.