Load balancing algorithms decide which backend server gets the next incoming request.
Here’s a common setup:
http {
upstream backend {
server backend1.example.com;
server backend2.example.com;
server backend3.example.com;
}
server {
listen 80;
location / {
proxy_pass http://backend;
}
}
}
By default, Nginx uses Round Robin.
upstream backend {
server backend1.example.com;
server backend2.example.com;
server backend3.example.com;
}
This is the simplest approach. For a steady stream of identical requests, it distributes them evenly:
- Request 1 -> backend1
- Request 2 -> backend2
- Request 3 -> backend3
- Request 4 -> backend1
- …and so on.
It’s straightforward and works well when all backend servers have roughly equal processing power and are handling similar types of requests. The problem arises when requests vary in complexity or server load is uneven.
Least Connections is often a better choice for dynamic workloads.
upstream backend {
least_conn;
server backend1.example.com;
server backend2.example.com;
server backend3.example.com;
}
This algorithm directs new requests to the server with the fewest active connections. If backend1 has 5 active connections, backend2 has 3, and backend3 has 7, the next request goes to backend2.
This is particularly useful when requests take varying amounts of time to process. A server that’s busy with long-running requests will naturally accumulate more connections. least_conn ensures that new, potentially short-lived requests are sent to less burdened servers, allowing them to finish their existing work and become available sooner. It helps prevent a situation where one server gets overloaded with many slow requests while others sit idle.
For sticky sessions or requests that must go to the same server, Hash is the way to go.
upstream backend {
hash $request_uri;
server backend1.example.com;
server backend2.example.com;
server backend3.example.com;
}
Here, $request_uri is a variable that contains the requested URI (e.g., /products/123). Nginx calculates a hash of this value and uses it to consistently map the request to a specific backend server.
This is critical for applications where a user’s session state is stored on a particular backend server. If a user makes multiple requests, you want all those requests to hit the same server to maintain their session. Without hash, a subsequent request might go to a different server that doesn’t have their session data, leading to broken functionality or a "session expired" error. You can hash on various variables, such as $remote_addr (client IP address) for basic IP-based stickiness, or custom headers.
The hash algorithm’s strength is its determinism. For a given input (like $request_uri), the output (the chosen server) is always the same. The challenge is that if you add or remove servers from the upstream group, the hash distribution changes for all requests. This means many users might be unexpectedly sent to a different server, potentially invalidating their sessions. This is known as the "thundering herd" problem or "hash ring rebalancing."
When using hash, the choice of the hash key is paramount. A key with high cardinality (many unique values) will distribute load more evenly. A key with low cardinality (few unique values) can lead to uneven distribution or even all requests going to one server. For example, hashing on $remote_addr might lead to uneven distribution if many users are behind a single NAT gateway.
The most surprising thing about these algorithms is how their effectiveness hinges entirely on the nature of your application’s traffic and the statefulness of your backend services. A theoretically "smarter" algorithm like least_conn can perform worse than round_robin if your requests are uniformly fast and your servers are identical, simply due to the overhead of tracking connection counts. Conversely, using hash with $request_uri on a site where most users only ever access / or /about will overload the server assigned to those URIs.
The next problem you’ll likely encounter is ensuring that your load balancer itself doesn’t become a single point of failure or a performance bottleneck.