HAProxy doesn’t just drop requests when its backend is overloaded; it actively manages them in queues, and understanding how it queues is the key to preventing loss.

Let’s watch HAProxy in action. Imagine a busy web server behind HAProxy. We’ll configure HAProxy to intentionally queue requests when the backend can’t keep up, and then we’ll see how those requests are served once the backend recovers.

Here’s a simplified HAProxy configuration snippet:

frontend http_in
    bind *:80
    mode http
    default_backend web_servers
    timeout client 10s

backend web_servers
    mode http
    balance roundrobin
    server web1 192.168.1.10:80 check
    server web2 192.168.1.11:80 check

    # Crucial queueing parameters
    maxconn 2000
    queue 5000   # Max requests in queue
    timeout queue 5s # How long to wait in queue

In this setup:

  • frontend http_in: This is where HAProxy listens for incoming requests on port 80.
  • backend web_servers: This defines the group of servers that will handle the requests.
  • server web1 192.168.1.10:80 check: Defines a backend server and tells HAProxy to check its health.
  • maxconn 2000: This is the maximum number of concurrent connections HAProxy will allow to the web_servers backend at any given moment. This is a hard limit on active connections.
  • queue 5000: This is the maximum number of requests that can be waiting in the queue for a backend server to become available. If maxconn is reached and there are already 5000 requests in the queue, new incoming requests will be dropped.
  • timeout queue 5s: This is the maximum time a request can spend waiting in the queue. If a request sits in the queue for longer than 5 seconds and a backend server still hasn’t become available, HAProxy will drop it.

The System in Action: A Simulated Overload

Let’s say web1 and web2 are running at 90% CPU, struggling to process requests.

  1. Initial Load: HAProxy receives requests. It dispatches them to web1 and web2.
  2. Reaching maxconn: As web1 and web2 get busy, they can’t accept new connections quickly enough. HAProxy’s maxconn limit for the backend is reached.
  3. Entering the Queue: Instead of immediately dropping new requests, HAProxy starts placing them into the queue defined by queue 5000. The frontend might still accept these requests, but they’re now waiting.
  4. Queue Timeout: If web1 and web2 don’t recover within the timeout queue 5s, requests that have been waiting for 5 seconds are dropped by HAProxy. This prevents requests from waiting indefinitely and consuming resources unnecessarily on the HAProxy itself.
  5. Backend Recovery: When web1 and web2 become available again (e.g., CPU drops to 30%), HAProxy starts pulling requests from the queue and sending them to the available backend servers. The queue effectively acts as a buffer, smoothing out temporary spikes in traffic or backend unresponsiveness.

The Mental Model: HAProxy as a Traffic Cop with a Waiting Room

Think of HAProxy’s connection management as a sophisticated traffic control system.

  • The Road (Backend Servers): These are your actual application servers. They have a limited capacity for handling cars (requests) at any given time.
  • The maxconn Limit: This is the maximum number of cars that can be actively on the road at once. If the road is full, no new cars can enter.
  • The On-Ramp (Frontend): This is where cars (requests) arrive. The frontend might accept more cars than can fit on the road immediately.
  • The Waiting Lane (queue): This is the queue. If the road is full (maxconn reached), cars wait here. The size of this lane is determined by queue 5000.
  • The Waiting Time Limit (timeout queue): This is the maximum time a car can sit in the waiting lane. If it waits too long, it’s turned away (dropped) to prevent gridlock in the waiting lane itself.

Controlling the Flow: Key Levers

  • maxconn: This is your first line of defense. Set it based on the actual concurrent connection capacity of your backend servers. Too low, and you’ll queue too early. Too high, and you risk overwhelming your backends if they do become slow.
  • queue: This is your buffer size. A larger queue can absorb bigger, shorter spikes. A smaller queue means requests will be dropped sooner if the backend can’t keep up. The trade-off is memory usage on HAProxy and the potential for more dropped requests during prolonged slowdowns.
  • timeout queue: This determines how patient HAProxy is. A shorter timeout is more aggressive about dropping stale requests, preventing them from lingering. A longer timeout gives backends more time to recover before requests are discarded.

The one thing most people don’t realize is that maxconn is a limit per backend, not for the entire HAProxy process. If you have multiple backends, each can have its own maxconn and queue settings. Also, maxconn applies to the total number of active connections to the backend servers, not just the rate of new connections. HAProxy will actively try to maintain this number by rejecting new connections when it’s reached, even if some existing connections are still active.

By tuning maxconn, queue, and timeout queue, you can precisely control how HAProxy handles traffic surges, ensuring that legitimate requests are served when possible and that the system doesn’t collapse under its own load.

The next step after managing queues is often dealing with the specific types of requests that are causing backends to slow down, which leads into request inspection and content switching.

Want structured learning?

Take the full Haproxy course →