The most surprising thing about load balancing algorithms is that "fairness" often means actively unbalancing traffic.

Let’s watch this in action. Imagine we have three backend servers: 10.0.0.1, 10.0.0.2, and 10.0.0.3. We’re using HAProxy, a popular load balancer, and we want to distribute incoming requests.

Here’s a basic HAProxy configuration snippet:

frontend http_frontend
    bind *:80
    default_backend http_backend

backend http_backend
    balance roundrobin
    server srv1 10.0.0.1:80 check
    server srv2 10.0.0.2:80 check
    server srv3 10.0.0.3:80 check

With balance roundrobin, HAProxy sends requests sequentially: server 1, then server 2, then server 3, then back to server 1. This is the simplest approach. If all requests are identical and all servers have identical capacity, this works beautifully.

But real traffic is rarely uniform. What if server 1 is much faster than servers 2 and 3? Round Robin will keep sending requests to server 1 just as often as the others, potentially overwhelming it while servers 2 and 3 sit idle. This is where the "fairness" paradox comes in.

We need algorithms that adapt. HAProxy offers several:

  • roundrobin: The default. Distributes requests evenly in a circular fashion.

    • Diagnosis: Check server response times under load. If one server consistently has higher latency or error rates, it might be overloaded.
    • Fix: If servers are identical, roundrobin is fine. If not, consider other algorithms.
    • Why it works: Simple, predictable distribution for homogenous servers.
  • static-rr: Similar to roundrobin but uses static weights. You can assign a weight to each server (e.g., server srv1 10.0.0.1:80 check weight 2). Server 1 would receive twice as many requests as a server with weight 1.

    • Diagnosis: Use HAProxy’s stats page (show stat) to monitor connection counts and response times per server. If weighted servers are still showing high load, the weights might need adjustment.
    • Fix: Adjust weight parameters in the backend configuration. For example, server srv1 10.0.0.1:80 check weight 5 and server srv2 10.0.0.2:80 check weight 1.
    • Why it works: Allows for manual tuning of traffic distribution based on known server capacities.
  • leastconn: Directs traffic to the server with the fewest active connections. This is often a better choice for long-lived connections (like WebSockets) or when request processing times vary significantly.

    • Diagnosis: Monitor the scur (current connections) column on the HAProxy stats page for each server. If one server has a significantly higher scur than others, it’s likely the bottleneck.
    • Fix: Change balance roundrobin to balance leastconn in the backend.
    • Why it works: Intuitively sends new requests to the least busy server, preventing any single server from accumulating too many concurrent tasks.
  • first: Sends all requests to the first available server. If that server goes down, it switches to the next.

    • Diagnosis: Check status and check_status on the HAProxy stats page. If the primary server is healthy, all traffic should be directed there.
    • Fix: Set balance first in the backend.
    • Why it works: Maximizes utilization of a primary server and provides a simple failover mechanism.
  • source: Uses a hash of the client’s IP address to determine which server receives the request. This ensures that requests from the same client IP always go to the same server (sticky sessions).

    • Diagnosis: Observe connection patterns on the HAProxy stats page. If cltot (total client connections) for a specific client IP is always routed to the same server, source is working.
    • Fix: Use balance source in the backend. You can also specify a hash algorithm like balance source <hash_algorithm> (e.g., balance source ipv6 for IPv6 addresses).
    • Why it works: Guarantees session persistence without requiring server-side session management, which is crucial for applications that store state locally on the server.
  • uri: Hashes the requested URI to select a server. Useful for caching proxies where identical URIs should hit the same cache.

    • Diagnosis: Inspect the req_rate and srv_rate on the HAProxy stats page. If requests for specific URIs are consistently hitting the same backend server, uri is functioning as expected.
    • Fix: Implement balance uri in the backend. You can also specify a hash function and a hash limit for more control.
    • Why it works: Distributes traffic based on the request path, ensuring that identical requests are consistently routed to the same backend for consistent responses or caching.
  • url_param: Hashes a specific URL parameter. Similar to uri but allows for finer-grained sticky sessions based on a parameter value.

    • Diagnosis: Examine the req_rate for requests containing the specified parameter. Verify that these requests are consistently sent to the same backend server.
    • Fix: Use balance url_param followed by the parameter name, e.g., balance url_param session_id.
    • Why it works: Provides session affinity based on a specific query string parameter, allowing different users (or different sessions for the same user if the parameter changes) to be routed to different servers while maintaining consistency for the same session identifier.
  • hdr: Hashes a specific HTTP header. Useful for routing based on custom headers, like an X-Forwarded-For header or an API key.

    • Diagnosis: Check the req_rate for requests containing the target header. Confirm that these requests are consistently directed to the same backend server.
    • Fix: Use balance hdr <header_name>, e.g., balance hdr X-User-ID.
    • Why it works: Enables sticky sessions or consistent routing based on arbitrary HTTP headers, offering flexibility for custom routing logic.

The "leastconn" algorithm is a workhorse for many modern applications. It dynamically shifts traffic away from servers that are experiencing a surge in active connections, regardless of how many requests they’ve received in total. This means if one server gets a few very slow requests, leastconn will naturally steer new, faster requests to other servers, balancing the actual load rather than just the request count.

What most people miss is that leastconn doesn’t account for the type of connection. A brief, high-CPU request might tie up a server’s resources for longer than a simple, quick request, even if the latter increments the connection count more rapidly. For truly intelligent load balancing that considers server CPU, memory, and custom metrics, you’d typically need external health checks or more advanced application-level routing logic.

If you’ve successfully implemented leastconn and your servers are still showing uneven load, the next problem you’ll likely encounter is that your health checks are too permissive, allowing unhealthy servers to remain in the pool.

Want structured learning?

Take the full Computer Networking course →