Nginx can make your load balancer disappear entirely, acting as a single, unified endpoint for clients while distributing traffic behind the scenes.

Imagine you have a fleet of identical web servers, say web1, web2, and web3, all running your application. Nginx can sit in front of them and present a single IP address and port to the outside world. When a client requests http://your-app.com, Nginx receives it and, based on your configuration, sends it to one of your backend servers. It’s like a smart traffic cop for your web services.

Here’s how you set this up in Nginx. You define a group of backend servers in a block called upstream.

http {
    upstream my_app_servers {
        server web1.example.com:8080;
        server web2.example.com:8080;
        server web3.example.com:8080;
    }

    server {
        listen 80;
        server_name your-app.com;

        location / {
            proxy_pass http://my_app_servers;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }
    }
}

In this example, my_app_servers is the name of our upstream group. We’ve listed three servers within it, each with its hostname and port. The server block then tells Nginx that for any request matching your-app.com, it should proxy_pass that request to the my_app_servers group. The proxy_set_header directives are crucial for passing along important client information to the backend servers, which they might otherwise only see as requests coming from the Nginx load balancer itself.

By default, Nginx uses a round-robin algorithm for load balancing. This means it simply cycles through the list of servers, sending one request to web1, the next to web2, then web3, and back to web1. This is often sufficient for many use cases, distributing traffic evenly across your healthy servers.

However, Nginx offers other load balancing methods. You can specify least_conn to send a new request to the server with the fewest active connections. This is particularly useful for long-lived connections or when your requests have varying processing times.

upstream my_app_servers {
    least_conn;
    server web1.example.com:8080;
    server web2.example.com:8080;
    server web3.example.com:8080;
}

Another common method is ip_hash. This method ensures that requests from the same client IP address are always sent to the same backend server. This is important for applications that rely on session stickiness, where a user’s session state is stored on a specific server.

upstream my_app_servers {
    ip_hash;
    server web1.example.com:8080;
    server web2.example.com:8080;
    server web3.example.com:8080;
}

Nginx also supports passive health checks. By default, if a server fails to respond to a request, Nginx will stop sending traffic to it for a short period. You can configure this behavior with max_fails and fail_timeout. max_fails=3 means Nginx will consider a server failed after 3 consecutive failed requests. fail_timeout=10s means Nginx will stop sending traffic to that failed server for 10 seconds before attempting to send traffic to it again.

upstream my_app_servers {
    server web1.example.com:8080 max_fails=3 fail_timeout=10s;
    server web2.example.com:8080 max_fails=3 fail_timeout=10s;
    server web3.example.com:8080 max_fails=3 fail_timeout=10s;
}

For more advanced health checking, you can use the health_check directive, which requires the Nginx Plus subscription. This allows Nginx to actively probe backend servers for health using various protocols like HTTP, TCP, or even custom scripts, providing much more granular control over what constitutes a "healthy" server.

When you configure proxy_pass to an upstream group, Nginx doesn’t just blindly forward requests. It maintains internal counters for each server in the upstream group, tracking the number of active connections and the number of failed requests. These counters are key to the load balancing algorithms and the automatic failover mechanisms. For instance, with least_conn, Nginx iterates through its internal connection counts for each server and picks the one with the minimum value. When a server fails to respond, Nginx increments its failure counter; if this counter exceeds max_fails within the fail_timeout window, Nginx marks that server as unavailable and temporarily removes it from the pool of active servers for subsequent requests.

The next concept you’ll likely explore is how to configure Nginx for SSL/TLS termination at the load balancer.

Want structured learning?

Take the full Load-balancing course →