Load Balance gRPC Traffic with Client-Side and Proxy Strategies (2026)

gRPC traffic balancing isn’t about picking the "best" server; it’s about ensuring no single server gets overwhelmed, even if it’s the "best" at handling a specific type of request.

Let’s see how this plays out. Imagine a microservice called UserAuth. It handles login and registration. A load balancer sits in front of multiple instances of UserAuth.

// Server-side gRPC implementation (simplified)
type server struct {
    pb.UnimplementedUserAuthServer
}

func (s *server) Login(ctx context.Context, req *pb.LoginRequest) (*pb.LoginResponse, error) {
    // Simulate work
    time.Sleep(100 * time.Millisecond)
    log.Printf("Handling login for user: %s", req.GetUsername())
    return &pb.LoginResponse{Success: true}, nil
}

func main() {
    lis, err := net.Listen("tcp", ":50051")
    if err != nil {
        log.Fatalf("failed to listen: %v", err)
    }
    s := grpc.NewServer()
    pb.RegisterUserAuthServer(s, &server{})
    log.Printf("server listening at %v", lis.Addr())
    if err := s.Serve(lis); err != nil {
        log.Fatalf("failed to serve: %v", err)
    }
}

A client wanting to log in would typically do something like this:

// Client-side gRPC (simplified)
func main() {
    // No load balancing here yet, just a single target
    conn, err := grpc.Dial("localhost:50051", grpc.WithInsecure())
    if err != nil {
        log.Fatalf("did not connect: %v", err)
    }
    defer conn.Close()
    c := pb.NewUserAuthClient(conn)

    for i := 0; i < 100; i++ {
        go func(idx int) {
            _, err := c.Login(context.Background(), &pb.LoginRequest{Username: fmt.Sprintf("user%d", idx)})
            if err != nil {
                log.Printf("Login failed for user%d: %v", idx, err)
            } else {
                log.Printf("Login successful for user%d", idx)
            }
        }(i)
    }
    time.Sleep(5 * time.Second) // Give goroutines time to finish
}

Without any load balancing, all 100 requests would hit the same UserAuth instance. If that instance is also handling registration, or if the login process is computationally intensive, it could become a bottleneck.

The core problem load balancing solves is distributing incoming requests across multiple identical service instances to prevent any single instance from becoming overloaded and degrading performance or availability.

There are two primary strategies:

Client-Side Load Balancing

In this model, the client itself is aware of all available backend servers and decides which one to send the request to. This requires the client to have a list of available endpoints and a strategy for picking one. gRPC has built-in support for this.

How it works:

Service Discovery: The client needs to know where the backend servers are. This is typically done via a service registry (like etcd, Consul, or Kubernetes DNS).
Resolver: A gRPC component that watches the service registry for updates to the list of available backend addresses.
Balancer: A gRPC component that takes the list of addresses from the resolver and applies a picking strategy (e.g., round robin, least connections).
Connection Management: The client establishes multiple connections to the backend servers, managed by the balancer.

Example Configuration (Conceptual - uses xds for demonstration):

For true client-side load balancing in gRPC, you’d typically configure it using the xds (eXperimental Discovery Service) protocol, which integrates with systems like Envoy or Istio. A simplified grpc.Dial might look like this if you were manually managing addresses:

// Conceptual client-side balancing (requires custom resolver/balancer or xds)
// This is NOT how you'd typically do it without a full xds setup.
// For demonstration, imagine a custom resolver provides these addresses.
backendAddrs := []string{"localhost:50051", "localhost:50052", "localhost:50053"}

// A real client-side balancer would use grpc.Dial with a custom resolver.
// For simplicity, we'll show a manual round-robin choice:
var nextServerIndex int32 = 0
var mu sync.Mutex

func getNextServerAddr(addrs []string) string {
    mu.Lock()
    defer mu.Unlock()
    addr := addrs[nextServerIndex]
    nextServerIndex = (nextServerIndex + 1) % int32(len(addrs))
    return addr
}

// Inside the client loop:
// serverAddr := getNextServerAddr(backendAddrs)
// conn, err := grpc.Dial(serverAddr, grpc.WithInsecure())
// ...

In a real xds setup, the client doesn’t hardcode addresses. It connects to an xds server (often an Envoy proxy), which tells it about the available backend addresses and how to balance. The grpc.Dial call would look more like:

// Using xds for client-side load balancing
// The actual target name will be a service name, not a direct IP:port.
// e.g., "userservice.default.svc.cluster.local"
conn, err := grpc.Dial("xds:///userservice", grpc.WithTransportCredentials(insecure.NewCredentials()))
if err != nil {
    log.Fatalf("did not connect: %v", err)
}
defer conn.Close()
c := pb.NewUserAuthClient(conn)
// ... rest of client logic

Why it works: The client distributes requests across multiple connections it maintains to different backend instances. It can implement sophisticated strategies (like picking the server with the fewest active requests) if the resolver and balancer are configured correctly.

Proxy-Based Load Balancing

Here, a dedicated proxy server (like Nginx, HAProxy, or Envoy) sits between the client and the backend servers. The client connects only to the proxy, and the proxy forwards the request to one of the backend servers based on its own configuration.

How it works:

Client to Proxy: The client is configured with the address of the proxy.
Proxy’s Backend Pool: The proxy is configured with a list of backend server addresses.
Proxy’s Balancing Algorithm: The proxy uses a configured algorithm (round robin, least connections, IP hash, etc.) to select a backend server.
Proxy to Backend: The proxy establishes a connection to the chosen backend server and forwards the request.
Response Path: The response travels back through the proxy to the client.

Example Configuration (Envoy Proxy):

Let’s say you have two UserAuth instances running on localhost:50051 and localhost:50052. You can configure an Envoy proxy to balance traffic to them.

envoy.yaml (simplified):

admin:
  access_log_path: /tmp/envoy.log
  address:
    socket_address: { address: 0.0.0.0, port_value: 9901 }
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 0.0.0.0, port_value: 8080 } # Client connects here
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: backend
              domains: ["*"]
              routes:
              - match: { prefix: "/" }
                route:
                  auto_host_rewrite: {}
                  cluster: userauth_service # Points to the cluster defined below
          http_filters:
          - name: envoy.filters.http.router
            typed_config: {}
  clusters:
  - name: userauth_service
    connect_timeout: 0.25s
    type: LOGICAL_DNS
    lb_policy: ROUND_ROBIN # Or LEAST_REQUEST, RING_HASH, etc.
    # For simple local testing, static endpoints are fine.
    # In production, this would often be dynamic via DNS or xds.
    load_assignment:
      cluster_name: userauth_service
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 50051 # First UserAuth instance
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 50052 # Second UserAuth instance
    # For gRPC, you need to tell Envoy to treat it as gRPC
    typed_extension_protocol_options:
      envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
        "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
        explicit_http_config:
          http2_protocol_options: {} # gRPC uses HTTP/2

The client would then connect to localhost:8080 (Envoy’s listening port) instead of the individual UserAuth instances.

// Client connecting to the proxy (Envoy)
func main() {
    // Client connects to the proxy address
    conn, err := grpc.Dial("localhost:8080", grpc.WithInsecure())
    if err != nil {
        log.Fatalf("did not connect: %v", err)
    }
    defer conn.Close()
    c := pb.NewUserAuthClient(conn)

    for i := 0; i < 100; i++ {
        go func(idx int) {
            _, err := c.Login(context.Background(), &pb.LoginRequest{Username: fmt.Sprintf("user%d", idx)})
            if err != nil {
                log.Printf("Login failed for user%d: %v", idx, err)
            } else {
                log.Printf("Login successful for user%d", idx)
            }
        }(i)
    }
    time.Sleep(5 * time.Second)
}

Why it works: The proxy acts as a central point of control. It handles the complexity of discovering backend health, applying load balancing policies, and even TLS termination, offloading this from the client.

The most surprising thing about gRPC load balancing is that the grpc.Dial call itself doesn’t look like it’s doing any balancing. The magic happens in the underlying infrastructure: either the client’s internal resolver/balancer components (often orchestrated by xds) or an external proxy that the client is directed to.

The lb_policy in Envoy (or similar in other proxies) is the core of its decision-making. ROUND_ROBIN is simple and effective for uniformly distributed request loads. LEAST_REQUEST is better when requests have wildly varying processing times, as it directs new requests to servers that are currently doing the least work. RING_HASH is crucial for stateful services or caching, ensuring that requests for the same key (e.g., a user ID) always go to the same backend.

When using client-side load balancing with xds, the client dynamically receives updates about backend endpoints and their weights from the control plane. It then uses this information to establish and manage connections, typically maintaining one connection per backend endpoint (or a pool of connections per endpoint, depending on configuration). The gRPC library’s built-in pick_first or round_robin policies can be used as fallbacks or simple implementations if xds is not fully configured, but they don’t offer the dynamic discovery and advanced policies of a full xds setup.

The next problem you’ll likely encounter is handling graceful connection draining and preventing requests from being sent to unhealthy backend instances.