Load Balancing Explained: Distribute Traffic Across Servers (2026)

Load balancing isn’t just about spreading requests; it’s how you keep your application alive when demand spikes, turning potential chaos into a smooth user experience.

Let’s watch it in action. Imagine we have three identical web servers: web-01, web-02, and web-03. A load balancer, say an HAProxy instance, sits in front of them.

frontend http_front
    bind *:80
    mode http
    default_backend http_back

backend http_back
    mode http
    balance roundrobin
    server web-01 192.168.1.10:80 check
    server web-02 192.168.1.11:80 check
    server web-03 192.168.1.12:80 check

When a user requests http://yourdomain.com, the HAProxy frontend intercepts it. The default_backend http_back tells HAProxy where to send this traffic. Inside the backend, balance roundrobin is the magic. It means HAProxy will send the first request to web-01, the second to web-02, the third to web-03, and then loop back to web-01. The check directive on each server means HAProxy pings them regularly (by default, every 2 seconds) to ensure they’re healthy. If web-02 becomes unresponsive, HAProxy will stop sending traffic there until it recovers.

This setup solves the single point of failure problem. Without a load balancer, if web-01 goes down, all traffic stops. With it, HAProxy seamlessly reroutes traffic to web-02 and web-03. It also handles traffic spikes. If your site suddenly gets 10,000 concurrent users, instead of overwhelming a single server, those 10,000 requests are distributed across your pool, keeping response times low.

The balance directive is your primary lever. roundrobin is simple and even. leastconn is often better for long-lived connections like WebSockets or database connections; it sends traffic to the server with the fewest active connections, preventing one server from accumulating a disproportionate load. source (or ip_hash) is useful when you need sticky sessions – where a user’s requests always go to the same server. This is crucial for applications that store session state locally on the server. HAProxy uses the client’s IP address to hash and determine which backend server to send the request to.

HAProxy’s health checks are more than just a simple ping. The check keyword can be augmented. For HTTP, you can specify httpchk GET /health. This tells HAProxy to make an actual HTTP GET request to /health on each backend server and expect a 2xx or 3xx response. This is far more robust than a TCP-level check, as it verifies that your application code is actually running and responding correctly. You can also configure timeouts for these checks, like inter 2000 rise 2 fall 3, meaning it checks every 2 seconds, considers a server healthy after 2 successful checks, and unhealthy after 3 failed checks.

When configuring a load balancer, especially for high-traffic sites, you’ll often encounter the concept of "server templates" or default settings for a group of servers. For instance, in HAProxy, you might define a server-template within a backend. This allows you to define common attributes like check inter 1000 rise 2 fall 3 once, and then simply list your servers, like server web-%i 192.168.1.10+%i:80 check. This is a minor convenience but scales well for hundreds of servers.

The most surprising thing is how often load balancers themselves become the bottleneck or a single point of failure if not configured for high availability. Running a single HAProxy instance is common, but for critical services, you’ll run two (or more) in an active/passive or active/active setup, often using tools like Keepalived to manage a virtual IP address that floats between the HAProxy instances. This ensures that even if one load balancer fails, traffic continues to flow uninterrupted to the remaining instances.

Understanding how session persistence interacts with different load balancing algorithms is key to troubleshooting user experience issues.