Weighted Round Robin (WRR) isn’t just about distributing traffic evenly; it’s a clever way to send more traffic to nodes that can handle it, dynamically adjusting to their capacity.
Imagine you have a load balancer with three backend servers: server-a, server-b, and server-c.
Here’s how a simple Round Robin would treat them:
Request 1 -> server-a
Request 2 -> server-b
Request 3 -> server-c
Request 4 -> server-a
…and so on, cycling through each server sequentially.
Now, let’s say server-a is a beast, capable of handling twice the load of server-b, and server-c is a bit slower, handling only half of server-b’s capacity. WRR lets us express this.
If server-a has a weight of 2, server-b has a weight of 1, and server-c has a weight of 0.5, the traffic distribution looks like this:
-
Cycle 1:
- Request 1 ->
server-a(weight 2) - Request 2 ->
server-a(weight 2, second turn) - Request 3 ->
server-b(weight 1) - Request 4 ->
server-c(weight 0.5)
- Request 1 ->
-
Cycle 2:
- Request 5 ->
server-a(weight 2) - Request 6 ->
server-a(weight 2, second turn) - Request 7 ->
server-b(weight 1) - Request 8 ->
server-c(weight 0.5)
- Request 5 ->
Notice how server-a gets two requests for every one of server-b, and server-b gets two requests for every one of server-c. The total "weight units" are 2 + 1 + 0.5 = 3.5. Over a long period, server-a will receive approximately (2 / 3.5) * 100% of the traffic, server-b will receive (1 / 3.5) * 100%, and server-c will receive (0.5 / 3.5) * 100%.
This is configured in many load balancers. For example, in Nginx, you’d define it in your nginx.conf like this:
http {
upstream backend_servers {
server 192.168.1.10 weight=2; # Faster server
server 192.168.1.11 weight=1; # Standard server
server 192.168.1.12 weight=0.5; # Slower server
}
server {
listen 80;
location / {
proxy_pass http://backend_servers;
}
}
}
Here, 192.168.1.10 (our server-a) is configured with weight=2, 192.168.1.11 (our server-b) with weight=1, and 192.168.1.12 (our server-c) with weight=0.5.
The core problem WRR solves is efficiently utilizing a heterogeneous fleet of servers. Without it, you’d either overload your faster machines or underutilize them while your slower machines struggle, leading to poor overall performance and user experience. It allows you to scale your capacity by adding more powerful machines without needing to perfectly match the capacity of every single server.
When a server’s weight is set to 0.5, it’s not that it receives half a request. Instead, the algorithm internally maintains counters or pointers for each server. When it’s time to select a server, it advances the pointer for the selected server and checks if its "turn" has come based on its weight relative to the total weight. For a weight of 0.5, it effectively means this server needs to wait for two "rounds" of selection by a server with weight 1 to get its turn. The actual implementation varies, but the principle is that a server with weight W will be selected approximately W times for every time a server with weight 1 is selected.
The trickiest part of implementing WRR is often handling the weights correctly when they are not integers. Many systems will internally scale all weights to the smallest common denominator or use a floating-point representation and a proportional counter. For instance, if you have weights 2, 1, and 0.5, the system might internally treat them as 4, 2, and 1 (multiplying by 2 to get integers) and then distribute traffic in a 4:2:1 ratio. This ensures that even fractional weights are handled proportionally over time.
The next concept to explore is how WRR behaves when servers fail or are taken offline.