Linkerd HTTP/2 Load Balancing: Per-Request Balancing (2026)

Linkerd’s HTTP/2 load balancing doesn’t just pick a backend server and stick with it for the entire connection; it can actually re-evaluate which instance to send each individual request to, even within the same long-lived HTTP/2 stream.

Let’s watch this in action. Imagine we have a simple emojivoto application running, and we want to see how Linkerd distributes requests to the voting service.

# First, ensure emojivoto is installed and Linkerd is injected
linkerd install | kubectl apply -f -
kubectl apply -f https://raw.githubusercontent.com/linkerd/linkerd2-demo/main/emojivoto.yml

# Now, let's send a bunch of requests and watch the stats
# In one terminal:
watch -n 1 "kubectl top pod -n emojivoto -l app=voting-svc"

# In another terminal, send traffic:
for i in {1..1000}; do \
  curl -s -X POST http://vote-leek.voting.svc.cluster.local/vote -d '{"vote": "👍"}'; \
  sleep 0.01; \
done

If you watch the kubectl top pod output, you’ll see the CPU and memory usage spread across the voting-svc pods. It’s not that one pod gets hammered for 10 seconds and then the next for 10 seconds. It’s a much finer-grained distribution.

The magic here is Linkerd’s understanding of HTTP/2. Unlike HTTP/1.1 where a new connection often means a new load balancing decision, HTTP/2 allows multiple independent "streams" to multiplex over a single TCP connection. Linkerd’s proxy, running as a sidecar to your application pods, intercepts these requests. When a request arrives, Linkerd consults its load balancing policy – typically a weighted round-robin by default, but configurable. It then selects the best-suited backend pod for that specific request. This decision is made independently for each request, even if multiple requests are flowing over the same underlying TCP connection.

This per-request balancing is crucial for ensuring even load distribution and high availability. If one backend pod temporarily becomes slow or unresponsive, Linkerd can quickly shift subsequent requests to healthier instances without needing to tear down and re-establish TCP connections, which would be a much more expensive operation in HTTP/1.1. It also means that if you have pods with varying capacities (e.g., different CPU/memory allocations), Linkerd can be configured to send proportionally more traffic to them, maximizing your resource utilization.

The core of this is Linkerd’s LoadBalancer configuration. By default, Linkerd uses a roundRobin strategy. You can see this in the linkerd-config ConfigMap in the linkerd namespace.

apiVersion: v1
kind: ConfigMap
metadata:
  name: linkerd-config
  namespace: linkerd
data:
  linkerd.io/control-api-port: "8080"
  linkerd.io/controller-log-level: info
  linkerd.io/controller-replicas: "2"
  linkerd.io/enable-diagnostics-api: "true"
  linkerd.io/enable-metrics-api: "true"
  linkerd.io/enable-pprof: "false"
  linkerd.io/enable-tap-api: "true"
  linkerd.io/proxy-auto-detection: "some"
  linkerd.io/proxy-cpu-limit: "100m"
  linkerd.io/proxy-cpu-request: "10m"
  linkerd.io/proxy-memory-limit: "50Mi"
  linkerd.io/proxy-memory-request: "20Mi"
  linkerd.io/proxy-version: <proxy-version>
  linkerd.io/service-mirror-enabled: "false"
  linkerd.io/tls-trust-dns-suffix: ""
  # ... other configurations
  linkerd.io/default-load-balancer: |
    kind: LoadBalancer
    strategy:
      kind: roundRobin

The strategy: kind: roundRobin is what dictates the per-request decision. Linkerd’s proxy maintains a list of healthy upstream instances for a given service. For each incoming request, it picks one from the list based on the chosen strategy. If you were to change this to leastRequest, Linkerd would track the number of active requests to each backend instance and send the new request to the instance with the fewest active requests, still on a per-request basis.

What most people don’t realize is that Linkerd’s per-request load balancing is also aware of TLS. Even when mTLS is enabled and the connection between Linkerd proxies is encrypted, the proxy on the originating side can still inspect the HTTP/2 request headers and make an informed load balancing decision before encrypting it for transmission to the destination proxy. This means you get fine-grained load balancing without sacrificing the security benefits of mTLS.

The next thing you’ll likely encounter is how Linkerd handles retries in conjunction with this per-request load balancing.