Load Balancer Metrics: Monitor Connections and Latency (2026)

Load balancers don’t just spread traffic; they’re the gatekeepers of your application’s responsiveness, and their metrics tell a story of user experience.

Let’s watch a request flow through a common setup: a user hits your domain, DNS resolves to your load balancer’s IP, and the load balancer, based on its configuration, picks an upstream server to forward that request to. It then waits for the upstream to respond and sends that response back to the user. All of this happens in milliseconds, but the load balancer is silently counting every step.

Here’s an example of a request being handled. Imagine this is a real-time log from an Nginx load balancer:

192.168.1.100 - - [10/Oct/2023:10:30:01 +0000] "GET /api/users HTTP/1.1" 200 1234 "-" "curl/7.68.0" 0.056 0.012 0.044

Let’s break down what’s happening here. The user (192.168.1.100) made a GET request to /api/users. The server responded with a 200 OK and 1234 bytes. The important bits for load balancer monitoring are the last three numbers: 0.056, 0.012, and 0.044.

0.056: This is the total time from when the load balancer received the request from the client until it sent the last byte of the response back to the client. This is your end-to-end latency from the load balancer’s perspective.
0.012: This is the time it took for the load balancer to receive the entire request from the client. If this number is consistently high, it could indicate network issues between the client and the load balancer, or the client itself is slow to send data.
0.044: This is the time from when the load balancer sent the request to the upstream server until it received the last byte of the response from that upstream server. This is the crucial metric for understanding your backend’s performance.

The core problem load balancers solve is availability and scalability. Without one, if your single application server goes down, your entire service is offline. If you get a surge of traffic, your single server buckles. A load balancer distributes incoming requests across multiple backend servers, ensuring that no single server is overwhelmed and that traffic can be rerouted if a server fails.

Internally, a load balancer (like Nginx, HAProxy, or cloud provider LBs) maintains a pool of backend servers. For each incoming client connection, it selects a backend server using a specific algorithm (round-robin, least connections, IP hash, etc.). It then proxies the request: it receives the request from the client, forwards it to the chosen backend, receives the response from the backend, and forwards that response back to the client. The metrics we’re discussing are measurements of the timing and state of these proxying operations.

Monitoring connections is about understanding the demand placed on your load balancer and its backends. Key metrics include:

Active Connections: The total number of established connections currently being handled by the load balancer. A steady increase might indicate growing user traffic. Spikes could be legitimate or due to a denial-of-service attack.
New Connections (or Connection Rate): The number of new connections established per second. This directly reflects the rate at which new clients are connecting.
Failed Connections (or Connection Errors): The number of connections that could not be established. This is a critical indicator of problems. High rates here could mean backend servers are unhealthy, the load balancer is misconfigured, or there are network issues.

Latency, as we saw in the log example, is about speed. The most important latency metrics are:

Request Latency (or Backend Latency): The time it takes for a backend server to process a request and send a response. This is often the response time in the log examples (the last number). High backend latency means your application code or database is slow.
Connection Latency (or Frontend Latency): The time it takes for the load balancer to establish a connection with the client. This is often the request time in the log examples (the first number). This includes network time to the client and the time to receive the full request.
Load Balancer Processing Time: The time the load balancer itself spends processing the request (e.g., routing, SSL termination). This is usually very small but can be a bottleneck if the LB is undersized or overloaded.

To monitor these, you’ll typically configure your load balancer to expose metrics. For Nginx, you can use the ngx_http_stub_status_module for basic connection counts or ngx_http_vts_module for more detailed metrics, including latency. For HAProxy, stats socket or stats page provide rich data. Cloud providers (AWS ELB, Google Cloud Load Balancing, Azure Load Balancer) expose these metrics via their respective monitoring services (CloudWatch, Cloud Monitoring, Azure Monitor).

For example, to get basic Nginx status:

Enable stub_status: In your nginx.conf (or a site-specific conf):

http {
    server {
        listen 80;
        location /nginx_status {
            stub_status;
            allow 127.0.0.1; # Restrict access
            deny all;
        }
    }
}

Then reload Nginx: sudo systemctl reload nginx or sudo service nginx reload.

Access the status page: curl http://localhost/nginx_status

This will output something like:
```
Active connections: 256
server accepts handled requests
 132231 132231 451047
Reading: 50 Writing: 10 Waiting: 196
```
- Active connections: Number of active client connections.
- accepts: Total number of accepted connections.
- handled: Total number of handled connections.
- requests: Total number of client requests.
- Reading: Number of connections where Nginx is reading the request header.
- Writing: Number of connections where Nginx is writing the response.
- Waiting: Number of idle keep-alive connections.

To get more granular latency metrics from Nginx, you’d use the http_ssl_module or http_vhost_traffic_status_module and configure log_format to include $request_time, $upstream_response_time, and $upstream_connect_time.

Example log_format for detailed timing:

log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                '$status $body_bytes_sent "$http_referer" '
                '"$http_user_agent" $request_time ' # Total LB time
                '$upstream_connect_time ' # Time to connect to upstream
                '$upstream_response_time'; # Time to get response from upstream

Then, you’d parse these logs to extract the timing fields and aggregate them into average, p95, and p99 latencies.

One thing most people don’t realize is how much of the perceived "slowdown" by users can be attributed to the network path between them and your load balancer, rather than your application’s actual processing time. If your request_time (total LB time) is high, but $upstream_response_time (backend processing) is low, the bottleneck is likely network congestion, DNS resolution delays for the client, or the client’s own network conditions. This is why it’s vital to measure both frontend and backend latencies separately.

Understanding these metrics allows you to differentiate between a slow application server, an overloaded load balancer, or a problematic network connection between your users and your infrastructure.

The next critical metric to monitor after connections and latency is the health status of your backend servers.