Configure HAProxy Caching to Reduce Backend Load (2026)

HAProxy doesn’t just serve requests; it can actively prevent them from ever reaching your backend servers, making your application feel faster and your infrastructure cheaper to run.

Let’s see HAProxy caching in action. Imagine we have a backend service that generates slightly expensive reports.

frontend http_in
    bind *:80
    acl is_report_request path_beg /reports
    http-request cache use if is_report_request
    http-cache-request method GET
    http-cache-request domain example.com
    http-cache-request cache-key hdr(Host),path,query
    http-cache-request cache-validity 5m # Cache for 5 minutes
    http-cache-request cache-size 100MB # 100MB cache
    server webserver1 192.168.1.10:80 check
    server webserver2 192.168.1.11:80 check

Here’s what’s happening:

frontend http_in: This defines our listening interface.
bind *:80: HAProxy listens on port 80.
acl is_report_request path_beg /reports: We define a condition. If the request URI starts with /reports, this ACL is_report_request becomes true.
http-request cache use if is_report_request: This is the core directive. If the is_report_request ACL is true, HAProxy will attempt to serve this request from its cache. If it’s not in the cache or expired, it will fetch it from the backend and store it.
http-cache-request method GET: We only cache GET requests. POST, PUT, DELETE, etc., are generally not cacheable.
http-cache-request domain example.com: We associate this cache with a specific domain. This is useful if HAProxy serves multiple domains.
http-cache-request cache-key hdr(Host),path,query: This defines what makes a cache entry unique. Here, it’s the combination of the Host header, the requested path, and any query parameters. So, example.com/reports/user/123?format=pdf and example.com/reports/user/456?format=pdf would be separate cache entries, but example.com/reports/user/123?format=pdf and example.com/reports/user/123?format=json would also be separate.
http-cache-request cache-validity 5m: Each cached response is considered valid for 5 minutes. After this, HAProxy will re-fetch from the backend.
http-cache-request cache-size 100MB: The maximum size of the cache is 100 megabytes. HAProxy will evict older or less frequently used items if this limit is reached.
server webserver1 ...: These are our actual backend servers that generate the reports.

When a client requests http://example.com/reports/user/123?format=pdf:

HAProxy checks the ACL is_report_request. It matches.
It then checks its cache for an entry with the key example.com/reports/user/123?format=pdf that is still valid according to cache-validity.
Cache Hit: If found and valid, HAProxy immediately returns the cached response to the client without bothering webserver1 or webserver2.
Cache Miss: If not found or expired, HAProxy forwards the request to one of the backend servers (webserver1 or webserver2). When the backend responds, HAProxy stores that response in its cache (if it fits within cache-size and is a cacheable response) before returning it to the client.

This configuration dramatically reduces the load on your backend servers for repetitive report requests, as subsequent requests for the same report within the 5-minute window are served directly from HAProxy’s memory or disk.

The most surprising thing about HAProxy’s caching is that it’s an extension to its core proxying logic, not a separate service. It intercepts requests, consults its local cache, and only if necessary, forwards to the backend. The configuration is declarative, meaning you describe what you want cached, and HAProxy handles the how.

The cache-key directive is your most powerful tool. By default, it uses hdr(Host),path. If your backend application uses query parameters to determine content (e.g., ?sort=asc vs. ?sort=desc), you must include query in the cache-key to differentiate these responses, or you’ll serve the wrong data from the cache. If the query parameters don’t affect the response, omit query to increase cache hit rates.

The next logical step is to understand how to manage cache invalidation more dynamically, perhaps by using http-cache-purge or integrating with backend applications to signal when cached content is stale.