HAProxy’s true power lies not in its speed, but in its ability to abstract complexity, making your entire application stack appear faster and more resilient than it actually is.
Let’s see HAProxy in action, serving traffic to a cluster of backend web servers. Imagine a user requesting https://example.com/images/logo.png.
frontend http_frontend
bind *:80
bind *:443 ssl crt /etc/ssl/certs/example.com.pem
mode http
default_backend web_servers
backend web_servers
mode http
balance roundrobin
server web1 192.168.1.10:80 check
server web2 192.168.1.11:80 check
server web3 192.168.1.12:80 check
When the request hits HAProxy on port 443, it first terminates the SSL connection using the specified certificate. Then, it inspects the request (it’s an HTTP request, so mode http). Based on the default_backend web_servers rule, it forwards the request to one of the servers in the web_servers pool. The balance roundrobin algorithm means it’ll pick web1, then web2, then web3, and then loop back to web1 for subsequent requests, distributing the load evenly. The check directive on each server line tells HAProxy to periodically ping each backend server to ensure it’s healthy; if a server fails the check, HAProxy will temporarily stop sending traffic to it.
This simple setup hides a lot. HAProxy is acting as a reverse proxy, a load balancer, an SSL terminator, and a health checker, all while potentially caching common responses and filtering malicious requests.
The core problem HAProxy solves is the "single point of failure" and the "performance bottleneck" inherent in a direct client-to-server connection model. By sitting in front of your application servers, it can:
- Distribute Load: Prevent any single application server from being overwhelmed.
- Provide High Availability: Automatically route around failed application servers.
- Handle SSL/TLS: Offload the CPU-intensive task of encryption/decryption from your application servers.
- Cache Content: Serve frequently requested static assets directly, reducing load on application servers.
- Filter Traffic: Block known bad actors or malformed requests before they reach your application.
- Graceful Maintenance: Allow you to take application servers offline for updates without interrupting service.
Internally, HAProxy maintains a set of listeners (the frontend sections) that accept incoming connections. For each connection, it consults a set of rules to determine which backend server pool (the backend sections) the connection should be directed to. It then uses a configured balancing algorithm to select a specific server within that pool and forwards the request. For health checking, it sends small, targeted requests to each backend server and analyzes the responses to determine operational status.
The acl (Access Control List) directive is where much of the magic happens for security and fine-grained routing. You can define conditions based on request headers, URI, source IP, and more, and then use these ACLs to direct traffic to different backends or even reject requests outright.
acl is_api_request hdr(Host) -i api.example.com
acl is_image_request url_end .png .jpg .gif
acl is_malicious_path url_beg /admin /etc
use_backend api_backend if is_api_request
use_backend image_backend if is_image_request
http-request deny if is_malicious_path
Here, requests to api.example.com go to api_backend, image files go to image_backend, and any request trying to access /admin or /etc is immediately denied. This is all done before the request even hits your application code.
The default behavior for a frontend with multiple bind lines is to listen on all IPs for that port. If you have multiple network interfaces and only want HAProxy to listen on a specific public IP for incoming requests, you must explicitly specify the IP address in the bind directive. For example, bind 192.0.2.1:443 ssl crt /etc/ssl/certs/example.com.pem will only accept connections on the interface with IP 192.0.2.1, ignoring other interfaces.
Understanding the distinction between mode http and mode tcp is crucial; mode http allows HAProxy to inspect and manipulate HTTP-specific elements like headers and URIs, while mode tcp operates at a lower level, simply forwarding raw TCP packets, which is useful for non-HTTP protocols but offers less granular control.
The next concept you’ll grapple with is how to effectively monitor HAProxy’s performance and troubleshoot issues that arise in a high-traffic environment.