Client-side load balancing is often more efficient than server-side load balancing because it distributes traffic before it even hits your load balancer.

Let’s see it in action with a simple example. Imagine we have three instances of a user-service running, each on a different port:

user-service-1:8080
user-service-2:8081
user-service-3:8082

With client-side load balancing, our client application (e.g., an API gateway or another microservice) will maintain a list of these user-service instances. When the client needs to call the user-service, it picks one from its list.

Here’s a conceptual Python snippet demonstrating this:

import random
import requests

class ServiceDiscovery:
    def __init__(self):
        self.instances = {
            "user-service": [
                "http://localhost:8080",
                "http://localhost:8081",
                "http://localhost:8082",
            ]
        }

    def get_instance(self, service_name):
        # Simple round-robin or random strategy
        available_instances = self.instances.get(service_name, [])
        if not available_instances:
            raise Exception(f"No instances found for {service_name}")
        return random.choice(available_instances) # Or cycle through a list

class ApiClient:
    def __init__(self, service_discovery):
        self.service_discovery = service_discovery

    def get_user_data(self, user_id):
        user_service_url = self.service_discovery.get_instance("user-service")
        try:
            response = requests.get(f"{user_service_url}/users/{user_id}")
            response.raise_for_status() # Raise an exception for bad status codes
            return response.json()
        except requests.exceptions.RequestException as e:
            print(f"Error calling user-service: {e}")
            # In a real system, you'd implement retry logic or mark the instance as unhealthy
            return None

# --- Usage ---
discovery = ServiceDiscovery()
client = ApiClient(discovery)

user_info = client.get_user_data(123)
if user_info:
    print(user_info)

In this setup, the ApiClient doesn’t talk to a central load balancer. It is the load balancer, in a distributed sense. It knows about the user-service instances directly and chooses one.

The problem this solves is reducing latency and a single point of failure. With server-side load balancing, every request from a client first hits a dedicated load balancer appliance (like an Nginx, HAProxy, or a cloud provider’s LB). That load balancer then forwards the request to one of the backend service instances. This adds an extra network hop for every request. Client-side load balancing bypasses this dedicated hop.

Internally, the client-side load balancer needs a way to discover the available service instances. This is often handled by a service registry (like Consul, etcd, or even Kubernetes’ DNS). The client periodically queries the registry for healthy instances of a service.

The exact levers you control are primarily the load balancing algorithm (round-robin, least connections, random, weighted, etc.) and the health checking mechanisms. The client library or framework you use will dictate what’s available. For instance, Spring Cloud Netflix’s Ribbon (though now largely superseded by Spring Cloud LoadBalancer) offered several strategies.

The key component you’re managing is the discovery mechanism. How does the client know which instances are available? This could be a simple hardcoded list (bad for dynamic environments), DNS-based discovery (like Kubernetes services), or a dedicated service registry. The health checks are crucial; if an instance fails, the client must quickly remove it from its list of available targets.

One of the most subtle benefits is that client-side load balancing can be more responsive to network conditions between services. If a particular user-service instance is experiencing high latency to the client making the request, a sophisticated client-side load balancer could detect this and temporarily reduce traffic to that instance, even if the instance itself reports being healthy to a central registry. This requires more advanced client libraries that can monitor actual request times.

The next concept you’ll likely grapple with is how to manage the state of these discovered instances across a large fleet of clients, especially when dealing with frequent deployments and failures.

Want structured learning?

Take the full Load-balancing course →