Load Balancing: Algorithms & Real-World Scale

Nginx’s load balancing isn’t about magic; it’s about deterministic algorithms that make predictable choices about where to send traffic, and the default round_robin is the simplest of them all.

Let’s watch Nginx distribute traffic. Imagine we have two backend servers, app1 and app2, both serving a simple "Hello from server X" message.

http {
    upstream backend {
        server 192.168.1.10:8080;
        server 192.168.1.11:8080;
    }

    server {
        listen 80;
        location / {
            proxy_pass http://backend;
        }
    }
}

When you hit http://your_nginx_ip/ repeatedly, Nginx will serve requests to app1, then app2, then app1, then app2, and so on. It’s like a ticket dispenser, handing out tickets one by one to each server in the list.

This round_robin method is Nginx’s default. It’s straightforward: assign each incoming request to the next server in the list. When it reaches the end of the list, it circles back to the beginning. This ensures a relatively even distribution of traffic over time, assuming all servers are equally capable and available.

But Nginx offers more sophisticated ways to distribute traffic. Consider least_conn. This algorithm sends the request to the server that currently has the fewest active connections.

http {
    upstream backend {
        least_conn;
        server 192.168.1.10:8080;
        server 192.168.1.11:8080;
    }

    server {
        listen 80;
        location / {
            proxy_pass http://backend;
        }
    }
}

If app1 is busy handling 10 requests and app2 is only handling 2, a new request will be sent to app2. This is particularly useful for applications where connection duration varies significantly; it prevents a few long-lived connections from hogging a server while others sit idle.

Then there’s ip_hash. This method uses the client’s IP address to determine which server receives the request. It hashes the IP address, and the resulting value dictates which server gets the connection.

http {
    upstream backend {
        ip_hash;
        server 192.168.1.10:8080;
        server 192.168.1.11:8080;
    }

    server {
        listen 80;
        location / {
            proxy_pass http://backend;
        }
    }
}

With ip_hash, all requests from a specific client IP address will consistently go to the same backend server. This is crucial for applications that rely on session stickiness, where user state is stored locally on the server. For example, if a user logs in on app1, subsequent requests from that user will always be routed to app1, maintaining their logged-in state. Without ip_hash or a similar mechanism, they might be sent to app2 and have to log in again.

Nginx also allows you to assign weights to servers. This is where you can tell Nginx that one server is more powerful or should receive more traffic than another.

http {
    upstream backend {
        server 192.168.1.10:8080 weight=3;
        server 192.168.1.11:8080 weight=1;
    }

    server {
        listen 80;
        location / {
            proxy_pass http://backend;
        }
    }
}

In this setup, using round_robin by default, app1 would receive approximately three times as many requests as app2. If app1 gets 3 requests, app2 gets 1, then app1 gets 3 more, app2 gets 1, and so on. This is perfect for scenarios where you have a mix of high-capacity and lower-capacity servers, or when you’re gradually migrating traffic to a new server.

When Nginx checks server health, it does so by default using simple TCP connection attempts. If a server doesn’t respond to a TCP SYN-ACK within a certain timeout (typically 1 second), Nginx marks it as down and stops sending traffic to it. This is configured via health_check directives, but more commonly, Nginx relies on max_fails and fail_timeout.

upstream backend {
    server 192.168.1.10:8080 max_fails=3 fail_timeout=30s;
    server 192.168.1.11:8080;
}

Here, if 192.168.1.10 fails to respond to three consecutive health checks within a 30-second window, Nginx will consider it unavailable and temporarily remove it from the pool of active servers. It will then try to re-evaluate its status after fail_timeout has elapsed.

Beyond these built-in methods, Nginx can also integrate with external health checking tools or use more advanced techniques like passive health checks, where it monitors for application-level errors (like HTTP 5xx responses) to determine server health, rather than just network-level connectivity. However, the core principle remains: Nginx actively tries to avoid sending traffic to servers that are not responding or are exhibiting errors.

The most surprising thing about Nginx’s load balancing is how its built-in algorithms are fundamentally about predictability rather than true dynamic, real-time load assessment. While least_conn and ip_hash do react to the current state, they operate on simple metrics (connection count, IP address) and don’t inherently inspect the workload or response time of the backend servers in a granular fashion without additional modules or configurations.

The next step is understanding how to combine these methods and extend Nginx’s load balancing capabilities with custom logic or third-party modules.