When should you split a monolith into services?

Nobody wakes up and thinks, “Today I will enjoy more network hops.” People split because something hurts: releases, incidents, hiring, or an auditor asking uncomfortable questions. The mistake is treating microservices as a maturity badge instead of answering one boring question:

What pain goes away if we pay the tax of running more than one deployable?

If you cannot name that pain in a sentence, you are not ready. Here is how I think about it, with examples that are composites of things I have seen in the wild (details changed, patterns real).

Four pressures that justify a second deployable

Inspired by the usual consulting slide—here as a compact map you can skim before reading the examples below.

The merge queue is not your product

Imagine three squads—Checkout, Search tweaks, and “that internal admin nobody talks about”—all committing to one repo. A harmless admin PR sits in review for four days because Checkout is hardening code for a promo window. Search cannot ship a relevance tweak because the nightly build is red after someone merged a refactor that touched shared utilities.

That is not a “big codebase” problem. It is a coupled release and ownership problem.

Example: At one place, a one-line copy change in the marketing banner required the same release train as a pricing rule change. Product asked why “simple” work took two weeks. Engineering said “process.” The real answer was: one artifact, one blast radius, one calendar.

One train, everyone’s luggage

Splitting (or modularizing with clear ownership and separate deployables) helps when you can say: “Checkout owns this binary; they ship when Checkout is ready.” If you can get that with modules + feature flags + stricter layering inside one app, do that first. I have seen teams cut queue time in half without a single new service—just boundaries and CI that match how teams actually work.

You are closer to a real split when clean modules exist on paper but in practice everyone still waits on everyone because there is only one deploy button.

When half the app wants a different kind of day

Some corners of a system are not “another CRUD screen.” They are batch jobs that chew CPU for twenty minutes, or a path that must hold ten thousand websocket connections, or a model server that wants a GPU and a Python stack while the rest of the company standardizes on one JVM stack.

Example: A team runs a Java Spring monolith for the storefront. They add “smart recommendations” by calling into a Python service inside the same deployment story—same JVM process was not viable, so they bolted on a sidecar-ish thing that still shared fate with the main app’s release and scaling. Every time they tuned heap for steady HTTP traffic, batch scoring jobs caused long GC pauses during peak shopping hours.

That is a workload mismatch: different scaling knobs, different failure modes, different tuning. Pulling scoring into its own service (with its own pool, autoscaler, and release cadence) is not resume-driven; it is stopping one workload from vetoing another.

Another small example: image thumbnailing in-process. Under load, thread pools for uploads starve the API threads handling login. Moving thumbnails to a queue + worker fleet is often the fix—whether you call that worker a “microservice” or a “job” is naming; the point is isolation of resource contention.

When “where the bug lives” and “who can touch prod” matter for real

Payments, identity, and large PII stores are not morally different from other code—but risk and compliance often treat them differently. In one monolith, a SQL injection in an obscure reporting endpoint might theoretically reach the same database session patterns as the table that stores tokens. Auditors do not always care that you “would never call it that way”; they care about scope and who can deploy what.

Example: A company needed PCI scope reduced. They could not shrink the cardholder environment while card data and the marketing CMS lived in the same deployable with shared libraries and shared DB credentials. Extracting a small, boring payments API behind strict network rules and separate credentials was painful—but it was a business requirement, not an architecture trend.

You do not always need microservices for this (encryption, tokenization, least privilege, read replicas with tight roles all help). But when the organization’s enforceable boundary is “this service, this team, this review bar,” splitting matches how risk is actually governed.

The noisy neighbor you can name in postmortems

This one is simple: after three incidents, if the timeline always says “search index rebuild saturated CPU” or “the export job locked the shared connection pool,” you have a named culprit. The monolith forces you to scale and restart everything when that culprit misbehaves.

Example: Black Friday traffic doubles normal RPS. Search autoscaling would help, but scaling the monolith also scales idle admin panels, cron jobs, and reporting endpoints—so you overpay and still hit DB connection limits because every instance opens the same pool size. Postmortem: “We scaled horizontally and made the DB problem worse.”

Extracting search (or exports, or whatever the graph says) lets you put backpressure, queues, and circuit breakers where the pain is, without pretending the whole app is one SLO.

Bad reasons I have actually heard in rooms

“It feels big.” So does a well-factored monolith with good tests. Size without pain is not a driver.
“We want engineers who only know Kubernetes.” That is hiring strategy disguised as architecture.
“Company X did it.” Company X also had fifty SREs and a platform team. Count your people before you count your services.
“Every feature touches every table anyway.” Splitting now gives you distributed ball of mud: slower, harder to debug, same conceptual coupling.

Distributed systems buy you independence at the cost of consistency stories, versioning, and partial failures. If the monolith is not blocking you on velocity, risk, or scale, you are prepaying that cost for no return.

The deal you are signing

If the right sticky makes you wince and you cannot name what on the left you are buying—wait.

A quick gut-check before you draw boxes

Napkin version: ready vs pause

Not a rubric for your performance review—just a quick vibe check before you pay for distributed debugging.

Green-ish: You can draw a boundary and say “this team owns it,” that area drives most of your merge fights or incidents, and you can describe an API or event contract to the rest without saying “well, except seventeen edge cases” in one breath.

Red: The only driver is trend or tooling; boundaries are mush; nobody wants to own on-call for a new service; you have no plan for traces and logs across two runtimes.

If you do split, start boring

Pick one slice with measurable pain. Define the contract first (even as an internal module boundary). Ship observability before you brag about the diagram. The goal is not “more services”—it is faster, safer, or clearer work for the people who maintain the system.

If you remember nothing else: split for a named pain, not for a named pattern.