Scaling a monolith horizontally is less about adding more identical copies and more about making each copy more efficient and capable of handling more concurrent requests.
Let’s imagine we have a single monolithic application running on a server. When traffic increases, that single server becomes a bottleneck. Horizontal scaling, in this context, means we’re going to add more instances of this monolith, but the real trick is how we make those instances work together and how we ensure they’re ready to take on the load.
Here’s a basic setup we might start with:
# /etc/nginx/sites-available/my-monolith
server {
listen 80;
server_name example.com;
location / {
proxy_pass http://localhost:8080; # Points to our single monolith instance
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
And our monolith application is listening on localhost:8080.
Now, let’s say we need to scale out. The first step is to get more instances of the monolith running. If it’s a Java app, we might start multiple JVMs. If it’s Node.js, we might use pm2 to manage multiple processes.
# Example using pm2 for a Node.js monolith
pm2 start app.js --name my-monolith-instance-1 --instances 4
This command starts 4 separate Node.js processes for app.js, each on its own port. Crucially, they cannot all listen on 8080. We need to assign them different ports. Let’s say our application is configured to read a PORT environment variable.
# Starting instances on different ports
PORT=8081 pm2 start app.js --name my-monolith-instance-1
PORT=8082 pm2 start app.js --name my-monolith-instance-2
PORT=8083 pm2 start app.js --name my-monolith-instance-3
PORT=8084 pm2 start app.js --name my-monolith-instance-4
Now, we need a way to distribute incoming traffic to these different instances. This is where a load balancer comes in. We can use Nginx itself for this, or a dedicated load balancer like HAProxy or an AWS ELB.
Let’s configure Nginx as a load balancer:
# /etc/nginx/sites-available/my-monolith-lb
upstream monolith_backend {
server 127.0.0.1:8081;
server 127.0.0.1:8082;
server 127.0.0.1:8083;
server 127.0.0.1:8084;
}
server {
listen 80;
server_name example.com;
location / {
proxy_pass http://monolith_backend; # Distributes to upstream servers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
With this configuration, Nginx will now round-robin requests to 8081, 8082, 8083, and 8084. Each of these ports is running a separate instance of our monolith.
The mental model here is that we’ve taken a single point of failure and a single processing unit and created a farm of identical units, with a traffic director in front. The load balancer ensures that no single monolith instance gets overloaded, and if one instance crashes, the others can continue to serve traffic.
However, this simple setup has a critical assumption: each monolith instance is stateless. If your monolith stores session data in memory (e.g., $_SESSION in PHP, or in-memory caches in Java/Node.js), this horizontal scaling will break. A user’s subsequent requests might go to a different instance that doesn’t have their session data.
To fix this, you need to externalize state. Common solutions include:
- Shared Database: All instances read and write session data to a central database. This is the most common approach.
- Distributed Cache: Using Redis or Memcached to store session data. This is faster than a database but adds another dependency.
- Client-Side Storage: Storing session tokens or entire session payloads in cookies or local storage (less common for full sessions due to security and size).
Let’s say we’re using Redis for sessions. Your monolith code would be modified to:
- On login/session start: Save session data to Redis, get a session ID.
- Send session ID back to client (e.g., in a cookie).
- On subsequent requests: Read session ID from cookie, fetch session data from Redis.
The configuration for this might look like your application code directly connecting to a Redis instance, or perhaps using a Redis client library that’s configured with REDIS_HOST=redis.example.com and REDIS_PORT=6379.
The truly tricky part of scaling a monolith horizontally, beyond just adding more processes and a load balancer, is managing shared mutable state. If your application relies on file system writes that aren’t coordinated, or in-memory caches that are specific to a single process, scaling out will lead to data inconsistencies and errors. Every shared resource that a monolith instance might touch needs to be accessible and consistent across all instances. This often means migrating state management from local process memory or local disk to a centralized, highly available service like a database or a distributed cache.
The next problem you’ll hit is how to manage deployments to these multiple instances without downtime.