Load Balancing Connection Draining: Zero-Downtime Deploys (2026)

When you take a server out of a load-balanced pool for maintenance, you want to ensure that any existing connections to that server are allowed to complete gracefully, rather than being abruptly terminated. This process is called connection draining, and it’s the secret sauce behind zero-downtime deploys for stateful applications.

Let’s watch this in action. Imagine a simple web application behind an Nginx load balancer.

http {
    upstream backend_servers {
        server 192.168.1.10:8080;
        server 192.168.1.11:8080;
        server 192.168.1.12:8080;
    }

    server {
        listen 80;
        server_name example.com;

        location / {
            proxy_pass http://backend_servers;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
}

Here, 192.168.1.10, 192.168.1.11, and 192.168.1.12 are our backend servers. If we want to take 192.168.1.10 offline for an update, a naive approach would be to simply remove it from the upstream block. New requests would stop going to it, but any requests already in progress on 192.168.1.10 would be unceremoniously killed when the server process is stopped.

Connection draining prevents this. The load balancer is instructed to stop sending new traffic to a server but to allow existing connections to finish. This is often managed by a configuration parameter on the load balancer itself. For Nginx, this isn’t a direct connection_drain directive in the same way as some cloud load balancers. Instead, it’s achieved through a combination of max_fails and fail_timeout in the upstream block, and a graceful shutdown of the server process.

When a server is marked for removal, you’d typically adjust its configuration to prevent new connections. For Nginx, this might look like this:

http {
    upstream backend_servers {
        server 192.168.1.10:8080 down; # Mark server as down
        server 192.168.1.11:8080;
        server 192.168.1.12:8080;
    }
    # ... rest of config
}

The down parameter tells Nginx not to send any new requests to 192.168.1.10. However, crucially, Nginx doesn’t immediately forget about connections that are already established to 192.168.1.10. It will continue to route traffic for those active sessions until they complete or a fail_timeout occurs.

The max_fails and fail_timeout directives are also important for automated health checking. If a server fails max_fails times within a fail_timeout period, it’s automatically marked as down.

http {
    upstream backend_servers {
        server 192.168.1.10:8080 max_fails=3 fail_timeout=30s;
        # ...
    }
    # ...
}

If 192.168.1.10 were to become unresponsive, Nginx would try to connect to it up to 3 times. If all 3 attempts fail within a 30-second window, Nginx would then consider 192.168.1.10 unavailable and stop sending it new requests for the next 30 seconds. While this is primarily for health checks, it informs the behavior around server availability.

The actual "draining" part for a planned removal is the manual intervention of marking the server down before stopping its process. You then wait. How long do you wait? That depends entirely on your application’s typical connection lifetimes. If your requests are very short-lived (e.g., simple API calls), a few seconds might be enough. If users are uploading large files or engaging in long-running transactions, you might need to wait minutes. The load balancer simply continues to proxy requests to the marked-down server until the connection is closed by either the client or the server.

Once you’ve waited a sufficient period, and you’re confident that no new connections are being established and existing ones are winding down, you can then stop the server process.

# On the server 192.168.1.10
sudo systemctl stop my-web-app

At this point, the load balancer will no longer have any active connections to 192.168.1.10, and the server is safely offline. After your maintenance is complete, you’d remove the down parameter from the Nginx configuration and restart the server, allowing it to rejoin the pool.

The most surprising part of connection draining is how simple the underlying mechanism can be. It’s not a complex protocol handshake; it’s often just the load balancer intelligently continuing to route traffic to a server that has been signaled as "offline" for new connections, relying on the natural lifecycle of existing TCP connections. The magic is in the timing of marking the server down and then waiting for those established connections to naturally expire.

The next challenge you’ll face is managing connection draining automatically as part of a CI/CD pipeline, where manual waiting periods become impractical.