Least connections load balancing doesn’t just pick a server randomly; it actively routes traffic to the server currently handling the fewest active connections.

Let’s see it in action. Imagine we have a basic HTTP load balancer in front of two backend web servers.

frontend http_in
    bind *:80
    mode http
    default_backend webservers

backend webservers
    mode http
    balance leastconn
    server web1 192.168.1.10:80 check
    server web2 192.168.1.11:80 check

Here, balance leastconn tells the load balancer to monitor the number of active connections to web1 and web2. When a new client request comes in, it will always be sent to whichever server has fewer connections at that exact moment. If both servers have the same number of connections, it will pick one of them (often based on insertion order, or a round-robin if they were added simultaneously).

This is incredibly useful because it prevents a single "busy" server from becoming overwhelmed. In a round-robin system, if one server is much faster or handles requests much quicker than another, it could still end up with more connections simply because it processes them faster and becomes available for new ones sooner. Least connections directly addresses this by aiming for an even distribution of workload, not just requests.

The core problem this solves is uneven resource utilization. If you have servers with varying capacities or if certain requests take longer than others, simple round-robin or IP hash can lead to one server being a bottleneck while others sit idle. Least connections tries to equalize the actual load on the servers.

Internally, the load balancer maintains a count for each backend server. When a connection is established, the count for that server increments. When a connection is closed, the count decrements. The leastconn algorithm simply consults these counts to make its routing decision. It’s a dynamic, stateful approach to load distribution.

The check directive in the backend configuration is crucial. This tells the load balancer to periodically send health checks (e.g., an HTTP GET request to /) to each backend server. If a server fails its health checks, the load balancer will temporarily remove it from the pool of available servers, ensuring that no traffic is sent to a dead or unresponsive machine. This prevents the leastconn algorithm from routing traffic to a server that can’t actually handle it, even if it has zero connections.

This mechanism is particularly effective in scenarios where connection durations vary significantly. For instance, in a database proxy setup, some queries might be very fast, while others could take minutes. Least connections ensures that the server with fewer long-running queries gets new requests, rather than a server that might appear "free" but is actually tied up with a single, massive query.

A common misconception is that leastconn guarantees perfect load balancing in all scenarios. It’s important to remember that it balances connections, not necessarily CPU or memory utilization directly. A server with fewer connections could still be more heavily utilized if those connections are performing extremely resource-intensive operations. However, in most typical web application scenarios, connection count is a strong proxy for overall server load.

The next thing to consider is how to handle sticky sessions when using leastconn if your application requires it.

Want structured learning?

Take the full Load-balancing course →