HAProxy agent checks let you feed custom health signals into the load balancer, enabling it to make smarter routing decisions than just basic TCP or HTTP checks.
Let’s watch HAProxy in action with an agent check. Imagine we have a backend service that reports its health via a custom API endpoint, say /healthz.
frontend http_frontend
bind *:80
default_backend web_servers
backend web_servers
mode http
balance roundrobin
server web1 192.168.1.10:80 check port 80 inter 2s fall 3 rise 2
server web2 192.168.1.11:80 check port 80 inter 2s fall 3 rise 2
listen haproxy_agent
mode http
bind *:9000
stats enable
stats uri /
stats refresh 10s
stats admin if TRUE
listen agent_check_listener
bind *:9999
mode tcp
option tcplog
# This is where the magic happens for agent checks
# The agent will connect to this port and report its status
# We'll define how HAProxy interprets that status below
acl is_healthy_agent_req hdr(host) -i agent.local
tcp-request connection expect * CRLF
tcp-request connection accept if is_healthy_agent_req
Now, let’s configure our backend server (e.g., web1 at 192.168.1.10) to report its health. We’ll use a simple Python script as our "agent."
import http.server
import socketserver
import time
PORT = 8080 # The port our agent will listen on
class HealthHandler(http.server.SimpleHTTPRequestHandler):
def do_GET(self):
if self.path == '/healthz':
# Simulate a healthy state
self.send_response(200)
self.send_header('Content-type', 'text/plain')
self.end_headers()
self.wfile.write(b"OK")
else:
self.send_response(404)
self.end_headers()
class HealthAgent(socketserver.TCPServer):
allow_reuse_address = True
def __init__(self, server_address):
super().__init__(server_address, HealthHandler)
self.health_status = "UP" # Our internal health status
def get_health_status(self):
return self.health_status
def set_health_status(self, status):
self.health_status = status
# This part is crucial: the agent needs to *tell* HAProxy its status.
# HAProxy listens on port 9999 for these status updates.
def send_health_update(haproxy_host, haproxy_port, status):
try:
with socketserver.TCPServer(("0.0.0.0", 0), http.server.SimpleHTTPRequestHandler) as ephemeral_server:
ephemeral_server.allow_reuse_address = True
ephemeral_server.server_bind()
ephemeral_port = ephemeral_server.server_address[1]
# This is a dummy server to get an ephemeral port.
# We don't actually use it for anything other than binding the port.
with socketserver.TCPServer(("0.0.0.0", 0), http.server.SimpleHTTPRequestHandler) as dummy_server:
dummy_server.allow_reuse_address = True
dummy_server.server_bind()
ephemeral_port_for_agent = dummy_server.server_address[1]
# We need to send the status to HAProxy's agent check listener.
# The format is usually: "<status>\n" where status is UP or DOWN.
# HAProxy expects a newline character to delimit messages.
message = f"{status}\n".encode('utf-8')
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as client_socket:
client_socket.connect((haproxy_host, haproxy_port))
client_socket.sendall(message)
print(f"Sent '{status}' to HAProxy on {haproxy_host}:{haproxy_port}")
except Exception as e:
print(f"Error sending health update: {e}")
if __name__ == "__main__":
import socket
import threading
haproxy_agent_host = "127.0.0.1" # HAProxy's IP address
haproxy_agent_port = 9999 # HAProxy's agent check listener port
# Start the HTTP server for our own health check endpoint
http_server_thread = threading.Thread(target=lambda: HealthAgent(("0.0.0.0", PORT)).serve_forever())
http_server_thread.daemon = True
http_server_thread.start()
print(f"HTTP health check endpoint running on port {PORT}")
# Periodically send health updates to HAProxy
def agent_loop():
while True:
# In a real scenario, you'd check your application's actual health here.
# For this example, we'll just toggle between UP and DOWN.
current_status = "UP" if time.time() % 10 < 5 else "DOWN"
send_health_update(haproxy_agent_host, haproxy_agent_port, current_status)
time.sleep(5) # Send update every 5 seconds
agent_thread = threading.Thread(target=agent_loop)
agent_thread.daemon = True
agent_thread.start()
print("Agent running. Press Ctrl+C to stop.")
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
print("Agent stopped.")
In this setup:
- HAProxy has a
backend web_serverswith twoserverentries. - Each
serverline hascheck port 80 inter 2s fall 3 rise 2. This is the standard health check. HAProxy will also perform this check. - We added a
listen agent_check_listeneron port9999. This is where our agent reports its health status to HAProxy. Theacl is_healthy_agent_req hdr(host) -i agent.localandtcp-request connection accept if is_healthy_agent_reqare a bit of a workaround to ensure HAProxy accepts connections on this port for the agent check mechanism. The actual status is sent via plain text over TCP. - Our Python agent script runs on the same host as the backend service. It has two jobs:
- It exposes its own health status at
http://localhost:8080/healthz. This is the health signal our application cares about. - Crucially, it connects to HAProxy’s agent check listener (
9999) and sends a simple string:UP\norDOWN\n.
- It exposes its own health status at
The mental model for agent checks is that HAProxy listens for health status updates from external agents, rather than initiating them itself (like with check port 80). The server line in HAProxy’s configuration needs a special directive to link it to the agent: agent-check.
So, our backend web_servers section in HAProxy would actually look like this to integrate the agent check:
backend web_servers
mode http
balance roundrobin
# Standard check is still good practice
server web1 192.168.1.10:80 check port 80 inter 2s fall 3 rise 2
# Agent check configuration:
# The agent connects to HAProxy's agent_check_listener on port 9999
# The agent will send its status (UP/DOWN) to this port.
# The 'inter 5s' here refers to how often HAProxy expects an update from the agent.
# If HAProxy doesn't receive an update within this interval, it will mark the server as down.
server web1 192.168.1.10:80 agent-check port 9999 inter 5s fall 3 rise 2
server web2 192.168.1.11:80 check port 80 inter 2s fall 3 rise 2
server web2 192.168.1.11:80 agent-check port 9999 inter 5s fall 3 rise 2
Notice we have two server lines for each actual server, one for the traditional check and one for the agent-check. HAProxy will use both mechanisms. The agent-check directive tells HAProxy to use the specified port (9999) as the endpoint for the agent to report its status. The inter 5s means HAProxy expects an update from the agent every 5 seconds. If it doesn’t get one, the server is marked as down.
The most surprising thing about agent checks is that HAProxy doesn’t actively query the agent; it waits for the agent to push its status. This allows for much more sophisticated health checks that can be based on application-level metrics, queue depths, external dependencies, or any signal your application can expose.
When the agent sends UP\n, HAProxy marks the server as healthy (provided the traditional checks also pass, or if agent-check is the primary check). When it sends DOWN\n, HAProxy immediately marks the server as unhealthy, regardless of the outcome of the standard check port 80. This allows for near real-time health status updates.
The agent-check mechanism is incredibly powerful for scenarios where the health of a backend service isn’t a simple "is it listening?" question. For example, a worker queue processing service might be listening on its port (so it passes a basic TCP check), but if its queue is backed up beyond a certain threshold, it’s effectively unhealthy for new requests. The agent can monitor this queue depth and report DOWN to HAProxy, preventing new traffic from being sent to it.
The inter value in the agent-check directive is critical. It defines how frequently HAProxy expects to receive a health update from the agent. If this interval is too long, HAProxy might be slow to react to a failing agent. If it’s too short, it could put unnecessary load on the agent or HAProxy itself. A common starting point is 5 or 10 seconds.
The next step is to explore how to use multiple agent checks per server and how to prioritize them.