Tuning Linux kernel parameters for HAProxy performance is less about tweaking random knobs and more about understanding how the kernel manages network state and how HAProxy’s connection patterns stress those limits.
Let’s watch HAProxy in action, handling a surge of connections. Imagine a busy web server. HAProxy sits in front, directing traffic.
# Simulate incoming connections with netcat
for i in {1..10000}; do
echo "GET / HTTP/1.1\r\nHost: example.com\r\n\r\n" | nc 127.0.0.1 8080 &
done
As these connections flood in, the Linux kernel’s network stack is the unsung hero (or sometimes, the bottleneck). It has to track each connection’s state, manage buffers, and route packets. HAProxy, with its high concurrency and rapid connection setup/teardown, can push these kernel mechanisms to their absolute limits.
The core problem HAProxy often hits is the kernel’s default limits on network resources, particularly around file descriptors and TCP connection states. When HAProxy is configured for high availability, it needs to maintain thousands, if not millions, of concurrent connections. Each connection consumes kernel resources.
The Usual Suspects: File Descriptors
The most common bottleneck is the limit on the number of open file descriptors. Every network socket, by convention in Unix-like systems, is represented as a file descriptor. HAProxy needs one for each established client connection and potentially for each connection to a backend server.
Diagnosis: Check current limits and usage.
# Check the hard and soft limits for the current user/process
ulimit -n
# Check the system-wide maximum number of file descriptors
cat /proc/sys/fs/file-max
# Monitor file descriptor usage for the HAProxy process (replace PID)
watch -n 1 "ls /proc/<HAPROXY_PID>/fd | wc -l"
Fix: Increase the limits.
Edit /etc/security/limits.conf and add or modify these lines:
* soft nofile 65536
* hard nofile 1048576
root soft nofile 65536
root hard nofile 1048576
Then, for the running HAProxy process, you might need to apply it via systemd unit files if HAProxy is managed by systemd. For example, in /etc/systemd/system/haproxy.service.d/limits.conf:
[Service]
LimitNOFILE=65536
LimitNOFILESoft=65536
Reload systemd and restart HAProxy:
sudo systemctl daemon-reload
sudo systemctl restart haproxy
Why it works: This directly raises the ceiling on how many file descriptors a process can open, allowing HAProxy to establish and maintain more concurrent connections without hitting the "Too many open files" error.
TCP Connection State Limits (Ephemeral Ports)
When a client initiates a connection, the kernel needs to assign a source port from a range of ephemeral ports. If HAProxy is making a huge number of outgoing connections (e.g., to many backend servers, or in a reverse proxy scenario where it acts as a client), it can exhaust this ephemeral port range.
Diagnosis: Monitor the number of connections in the CLOSE_WAIT and FIN_WAIT states.
# Monitor TCP connection states
watch -n 1 "ss -s"
# Specifically look for states related to outgoing connections
watch -n 1 "ss -tan state established \
| grep -c -E '(\s(FIN_WAIT1|FIN_WAIT2|CLOSE_WAIT|LAST_ACK|CLOSING)\s)'"
Fix: Increase the ephemeral port range and reduce TIME_WAIT.
Add or modify these lines in /etc/sysctl.conf:
net.ipv4.ip_local_port_range = 10000 65535
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
Apply the changes:
sudo sysctl -p
Why it works: net.ipv4.ip_local_port_range expands the pool of available source ports for outgoing connections. net.ipv4.tcp_fin_timeout tells the kernel to quickly release TCP sockets in the FIN_WAIT state. net.ipv4.tcp_tw_reuse and net.ipv4.tcp_tw_recycle allow new connections to reuse sockets in the TIME_WAIT state, which is crucial for high-traffic servers that rapidly establish and tear down connections. Note: tcp_tw_recycle can cause issues with NAT, so use with caution. tcp_tw_reuse is generally safer.
TCP Backlog and Queues
The kernel maintains queues for incoming TCP connections before they are fully accepted by the application. If traffic surges, these queues can fill up, leading to dropped connection attempts.
Diagnosis: Check the current backlog and dropped packets.
# Check current backlog settings
sysctl net.core.somaxconn
sysctl net.ipv4.tcp_max_syn_backlog
# Monitor for dropped packets (especially SYN packets)
watch -n 1 "netstat -s | grep -i 'listen queue dropped'"
Fix: Increase the backlog queues.
Add or modify these lines in /etc/sysctl.conf:
net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 2048
Apply the changes:
sudo sysctl -p
You’ll also need to ensure HAProxy’s maxconn and backlog settings in its configuration are at least as high as these kernel values.
global
maxconn 200000
nbthread 4
# ... other global settings
defaults
mode http
timeout connect 5s
timeout client 50s
timeout server 50s
# ... other defaults
listen http_front
bind *:8080 backlog 4096
server app1 127.0.0.1:80 check
# ... other listen directives
Why it works: net.core.somaxconn increases the maximum queue length for pending connections for all socket types. net.ipv4.tcp_max_syn_backlog specifically increases the queue for SYN packets during the TCP handshake. This allows the kernel to buffer more incoming connection requests during brief spikes, giving HAProxy more time to accept them.
Network Buffer Sizes
TCP communication relies on send and receive buffers. If these buffers are too small, data can be dropped, or transmission can be slowed down, especially for high-bandwidth, low-latency connections.
Diagnosis: Monitor buffer-related statistics.
# Monitor TCP receive buffer usage
watch -n 1 "ss -t -i | grep -e retrans -e unacked -e cwnd"
# Check current buffer settings
sysctl net.core.rmem_max
sysctl net.core.wmem_max
sysctl net.ipv4.tcp_rmem
sysctl net.ipv4.tcp_wmem
Fix: Increase buffer sizes.
Add or modify these lines in /etc/sysctl.conf:
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 16384 16777216
Apply the changes:
sudo sysctl -p
Why it works: These settings increase the maximum size of the kernel’s receive (rmem) and send (wmem) buffers for network sockets. Larger buffers can hold more data, reducing packet loss and improving throughput for high-speed network links and applications that handle large amounts of data per connection.
CPU Affinity and Interrupt Handling
For extreme performance, ensuring that network interrupts and HAProxy threads are processed on the same CPU cores can significantly reduce latency and context switching overhead. This is a more advanced tuning step.
Diagnosis: Use tools like irqbalance and htop to observe interrupt distribution and CPU usage.
# Check if irqbalance is running and its configuration
systemctl status irqbalance
# Check which CPUs are handling network interrupts
grep eth0 /proc/interrupts # Replace eth0 with your interface name
# Observe CPU usage and affinity in htop (press 'H' for threads, 'F5' for tree view)
htop
Fix: Configure CPU affinity.
This is often done via irqbalance configuration or by manually pinning HAProxy threads to specific CPU cores using tools like taskset or within the HAProxy configuration itself if compiled with specific options. For example, to bind HAProxy to cores 2 and 3:
taskset -c 2,3 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg
Or in haproxy.cfg:
global
# ... other settings
cpu-affinity 2 3
# ...
Why it works: By dedicating specific CPU cores to network interrupt processing and HAProxy threads, you minimize cache misses and context switching. This ensures that data received by the network card can be processed directly by the HAProxy threads that will handle it, leading to lower latency and higher throughput.
After tuning these parameters, the next error you might encounter is related to HAProxy’s internal connection management limits, such as exceeding the maxconn directive in your HAProxy configuration, or hitting specific backend server connection limits.