The most surprising thing about load balancing algorithms is that "fairness" often means actively unbalancing traffic.
Let’s watch this in action. Imagine we have three backend servers: 10.0.0.1, 10.0.0.2, and 10.0.0.3. We’re using HAProxy, a popular load balancer, and we want to distribute incoming requests.
Here’s a basic HAProxy configuration snippet:
frontend http_frontend
bind *:80
default_backend http_backend
backend http_backend
balance roundrobin
server srv1 10.0.0.1:80 check
server srv2 10.0.0.2:80 check
server srv3 10.0.0.3:80 check
With balance roundrobin, HAProxy sends requests sequentially: server 1, then server 2, then server 3, then back to server 1. This is the simplest approach. If all requests are identical and all servers have identical capacity, this works beautifully.
But real traffic is rarely uniform. What if server 1 is much faster than servers 2 and 3? Round Robin will keep sending requests to server 1 just as often as the others, potentially overwhelming it while servers 2 and 3 sit idle. This is where the "fairness" paradox comes in.
We need algorithms that adapt. HAProxy offers several:
-
roundrobin: The default. Distributes requests evenly in a circular fashion.- Diagnosis: Check server response times under load. If one server consistently has higher latency or error rates, it might be overloaded.
- Fix: If servers are identical,
roundrobinis fine. If not, consider other algorithms. - Why it works: Simple, predictable distribution for homogenous servers.
-
static-rr: Similar toroundrobinbut uses static weights. You can assign a weight to each server (e.g.,server srv1 10.0.0.1:80 check weight 2). Server 1 would receive twice as many requests as a server with weight 1.- Diagnosis: Use HAProxy’s stats page (
show stat) to monitor connection counts and response times per server. If weighted servers are still showing high load, the weights might need adjustment. - Fix: Adjust
weightparameters in the backend configuration. For example,server srv1 10.0.0.1:80 check weight 5andserver srv2 10.0.0.2:80 check weight 1. - Why it works: Allows for manual tuning of traffic distribution based on known server capacities.
- Diagnosis: Use HAProxy’s stats page (
-
leastconn: Directs traffic to the server with the fewest active connections. This is often a better choice for long-lived connections (like WebSockets) or when request processing times vary significantly.- Diagnosis: Monitor the
scur(current connections) column on the HAProxy stats page for each server. If one server has a significantly higherscurthan others, it’s likely the bottleneck. - Fix: Change
balance roundrobintobalance leastconnin the backend. - Why it works: Intuitively sends new requests to the least busy server, preventing any single server from accumulating too many concurrent tasks.
- Diagnosis: Monitor the
-
first: Sends all requests to the first available server. If that server goes down, it switches to the next.- Diagnosis: Check
statusandcheck_statuson the HAProxy stats page. If the primary server is healthy, all traffic should be directed there. - Fix: Set
balance firstin the backend. - Why it works: Maximizes utilization of a primary server and provides a simple failover mechanism.
- Diagnosis: Check
-
source: Uses a hash of the client’s IP address to determine which server receives the request. This ensures that requests from the same client IP always go to the same server (sticky sessions).- Diagnosis: Observe connection patterns on the HAProxy stats page. If
cltot(total client connections) for a specific client IP is always routed to the same server,sourceis working. - Fix: Use
balance sourcein the backend. You can also specify a hash algorithm likebalance source <hash_algorithm>(e.g.,balance source ipv6for IPv6 addresses). - Why it works: Guarantees session persistence without requiring server-side session management, which is crucial for applications that store state locally on the server.
- Diagnosis: Observe connection patterns on the HAProxy stats page. If
-
uri: Hashes the requested URI to select a server. Useful for caching proxies where identical URIs should hit the same cache.- Diagnosis: Inspect the
req_rateandsrv_rateon the HAProxy stats page. If requests for specific URIs are consistently hitting the same backend server,uriis functioning as expected. - Fix: Implement
balance uriin the backend. You can also specify a hash function and a hash limit for more control. - Why it works: Distributes traffic based on the request path, ensuring that identical requests are consistently routed to the same backend for consistent responses or caching.
- Diagnosis: Inspect the
-
url_param: Hashes a specific URL parameter. Similar touribut allows for finer-grained sticky sessions based on a parameter value.- Diagnosis: Examine the
req_ratefor requests containing the specified parameter. Verify that these requests are consistently sent to the same backend server. - Fix: Use
balance url_paramfollowed by the parameter name, e.g.,balance url_param session_id. - Why it works: Provides session affinity based on a specific query string parameter, allowing different users (or different sessions for the same user if the parameter changes) to be routed to different servers while maintaining consistency for the same session identifier.
- Diagnosis: Examine the
-
hdr: Hashes a specific HTTP header. Useful for routing based on custom headers, like anX-Forwarded-Forheader or an API key.- Diagnosis: Check the
req_ratefor requests containing the target header. Confirm that these requests are consistently directed to the same backend server. - Fix: Use
balance hdr <header_name>, e.g.,balance hdr X-User-ID. - Why it works: Enables sticky sessions or consistent routing based on arbitrary HTTP headers, offering flexibility for custom routing logic.
- Diagnosis: Check the
The "leastconn" algorithm is a workhorse for many modern applications. It dynamically shifts traffic away from servers that are experiencing a surge in active connections, regardless of how many requests they’ve received in total. This means if one server gets a few very slow requests, leastconn will naturally steer new, faster requests to other servers, balancing the actual load rather than just the request count.
What most people miss is that leastconn doesn’t account for the type of connection. A brief, high-CPU request might tie up a server’s resources for longer than a simple, quick request, even if the latter increments the connection count more rapidly. For truly intelligent load balancing that considers server CPU, memory, and custom metrics, you’d typically need external health checks or more advanced application-level routing logic.
If you’ve successfully implemented leastconn and your servers are still showing uneven load, the next problem you’ll likely encounter is that your health checks are too permissive, allowing unhealthy servers to remain in the pool.