HAProxy’s nbproc and nbthread settings don’t just distribute work; they fundamentally change how HAProxy handles network connections, often leading to performance gains that feel more like a system-level optimization than an application tweak.
Let’s see it in action. Imagine you have a multi-core server and a busy HAProxy instance. Without tuning, HAProxy runs as a single process.
# ps aux | grep haproxy
haproxy 12345 0.0 0.1 123456 7890 ? Ss Oct26 0:15 /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg
Now, let’s say we want to leverage our 8 cores. We’d add these to our global section in haproxy.cfg:
global
log /dev/log local0
maxconn 4096
nbproc 8
nbthread 4
cpu-map 1 0-1
cpu-map 2 2-3
cpu-map 3 4-5
cpu-map 4 6-7
cpu-map 5 0-1
cpu-map 6 2-3
cpu-map 7 4-5
cpu-map 8 6-7
daemon
stats socket /run/haproxy.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
pidfile /var/run/haproxy.pid
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
listen http_front
bind *:80
balance roundrobin
server s1 192.168.1.10:80 check
server s2 192.168.1.11:80 check
After reloading HAProxy, you’ll see multiple processes, each bound to specific CPU cores.
# ps aux | grep haproxy
haproxy 12345 0.0 0.1 123456 7890 ? Ss Oct26 0:15 /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg
haproxy 54321 1.2 0.5 123456 12345 ? S 10:00 0:05 /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg
haproxy 54322 1.1 0.5 123456 12345 ? S 10:00 0:04 /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg
haproxy 54323 1.0 0.4 123456 11111 ? S 10:00 0:04 /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg
haproxy 54324 0.9 0.4 123456 11111 ? S 10:00 0:03 /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg
haproxy 54325 0.8 0.4 123456 11111 ? S 10:00 0:03 /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg
haproxy 54326 0.7 0.3 123456 10000 ? S 10:00 0:02 /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg
haproxy 54327 0.6 0.3 123456 10000 ? S 10:00 0:02 /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg
haproxy 54328 0.5 0.3 123456 10000 ? S 10:00 0:01 /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg
The problem this solves is the single-threaded nature of network I/O. A single HAProxy process, even on a multi-core CPU, can only process so many network events concurrently without blocking. nbproc creates multiple independent HAProxy processes, each capable of handling its own set of connections. nbthread then allows each of these processes to use multiple threads for I/O, further increasing concurrency.
The nbproc setting determines the number of HAProxy processes. Each process will bind to a specific set of CPU cores. nbthread determines the number of threads within each process that will handle network I/O. The cpu-map directive explicitly assigns which CPU cores each process (identified by a sequential number starting from 1) will use. If you don’t use cpu-map, HAProxy will attempt to distribute them automatically, but explicit mapping is usually better for predictable performance. nbproc should generally be set to the number of CPU cores you want HAProxy to utilize. nbthread is often set to 1 or 2; higher values might not yield significant gains due to thread-contention overhead.
When you set nbproc 8 and nbthread 4, you’re telling HAProxy to launch 8 worker processes. The cpu-map directives then tell HAProxy how to pin these processes to your CPU cores. For instance, cpu-map 1 0-1 means the first worker process will be pinned to CPU cores 0 and 1. If you have 8 cores, you’d typically want to map each process to at least one core. If you set nbthread greater than 1, each of those 8 processes will then spawn 4 threads to handle network I/O, further segmenting the workload. The master process (the one that started the others) will not have its I/O handled by these threads.
The most surprising outcome of this configuration is often not just increased throughput, but also a reduction in latency, especially under high load. This is because each process and thread is less likely to be starved by a single, overwhelming connection or a long-running internal operation, leading to more consistent response times.
The next logical step after tuning nbproc and nbthread is to investigate how HAProxy’s connection management, such as maxconn and timeout settings, interact with these multi-process/multi-threaded configurations.