Monoliths can scale horizontally just as effectively as microservices, but the strategy shifts from scaling individual services to scaling the entire application instance.

Let’s see what that looks like in practice. Imagine a monolithic e-commerce application with a single, massive codebase handling everything: user authentication, product catalog, order processing, and payment gateway integration.

+-----------------------+
|   Monolithic App      |
|-----------------------|
| - Auth Module         |
| - Catalog Module      |
| - Orders Module       |
| - Payments Module     |
+-----------------------+
        ^
        |
+-----------------------+
|      Load Balancer    |
+-----------------------+
        ^
        |
+-----------------------+  +-----------------------+  +-----------------------+
|   App Instance 1      |  |   App Instance 2      |  |   App Instance 3      |
+-----------------------+  +-----------------------+  +-----------------------+

When traffic surges, we don’t spin up a new "Order Service" instance. Instead, we spin up an entirely new, identical "Monolithic App Instance." The load balancer then distributes incoming requests across all available instances.

This pattern is often referred to as "scale-out" or "horizontal scaling" of the monolith. The core idea is that each instance is a complete, self-contained copy of the entire application.

The problem this solves is obvious: a single instance can only handle so much. As user traffic, data volume, or processing demands increase, that single instance becomes a bottleneck. By replicating the entire monolith and distributing load, we can handle significantly more concurrent users and operations.

Internally, this works because the monolith, despite its single codebase, is designed with internal modularity (even if not separate deployable units). Each request is routed to an available instance, and within that instance, the appropriate module (e.g., the "Orders Module") handles the request. The state management (like user sessions or in-flight transactions) becomes the critical piece. Often, this state is externalized to shared services like a distributed cache (Redis, Memcached) or a shared database.

Consider a typical web request flow for an order placement:

  1. User Request: A customer clicks "Place Order" in their browser.
  2. Load Balancer: The request hits the load balancer.
  3. Instance Selection: The load balancer picks an available App Instance (e.g., App Instance 2).
  4. Authentication: The Auth Module within App Instance 2 verifies the user’s session, likely by checking a shared Redis session store.
  5. Order Processing: The Orders Module within App Instance 2 receives the order details. It might interact with a shared PostgreSQL database to persist the order.
  6. Payment: The Payments Module within App Instance 2 communicates with an external payment gateway.
  7. Response: App Instance 2 sends the confirmation back to the user.

The key levers you control here are:

  • Number of Instances: How many copies of the monolith are running? This is the primary scaling knob. You might use auto-scaling groups in cloud environments (AWS EC2 Auto Scaling, Azure VM Scale Sets) based on CPU utilization, request queue length, or custom metrics.
  • Load Balancer Configuration: How does the load balancer distribute traffic? Common strategies include round-robin, least connections, or IP hash. For stateful applications, IP hash can sometimes be useful to ensure a user stays on the same instance, though this is often less critical if state is externalized.
  • Resource Allocation per Instance: What CPU, RAM, and network resources does each individual monolith instance have? This determines the baseline capacity of a single instance.
  • Externalized State Management: The performance and scalability of your shared database, cache, and message queues are paramount. If your database becomes the bottleneck, adding more monolith instances won’t help.

The fact that a single, large process handles all these distinct concerns within an instance is often seen as a downside for independent development and deployment, but for raw, aggregate request throughput, a well-provisioned and distributed set of monolith instances can be surprisingly robust. The complexity doesn’t lie in breaking apart the monolith, but in ensuring the underlying shared infrastructure (databases, caches) can keep up with the aggregate load from all those instances.

When you’ve scaled out your monolith instances, the next bottleneck you’ll likely encounter is the shared database.

Want structured learning?

Take the full Monolith course →