The Linkerd proxy’s resource requests and limits are surprisingly complex, and often, the default settings are a poor fit for real-world traffic patterns, leading to either wasted resources or performance degradation.

Imagine a single pod running Linkerd. Inside that pod, you have your application container and the linkerd-proxy container. The linkerd-proxy is the busy little bee intercepting all inbound and outbound traffic for your application. It’s doing a lot: TLS termination, routing, metrics collection, retries, circuit breaking. All this work requires CPU and memory.

Here’s a simplified view of a Linkerd-enabled pod’s resource definition in Kubernetes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
      - name: my-app
        image: my-app:latest
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "200m"
            memory: "256Mi"
      - name: linkerd-proxy
        image: public.ecr.aws/linkerd/proxy:2.14.1
        resources:
          requests:
            cpu: "100m"
            memory: "50Mi"
          limits:
            cpu: "200m"
            memory: "100Mi"

The problem arises because the default resource requests and limits for the linkerd-proxy are often too low, especially for applications with moderate to high traffic or complex request patterns. This leads to the Kubernetes scheduler potentially placing pods on nodes that don’t have enough resources, or the proxy itself being starved of CPU and memory when it needs it most.

Common Causes and Fixes

  1. Under-provisioned CPU Requests for the Proxy:

    • Diagnosis: Monitor the linkerd-proxy container’s CPU utilization via kubectl top pod <pod-name> -c linkerd-proxy. Look for consistently high CPU usage, especially during traffic spikes. You might also see increased latency in your application’s response times, which can be a symptom of CPU throttling. Check kubectl describe pod <pod-name> for OOMKilled events or reason: OOMKilled in the linkerd-proxy container’s status.
    • Fix: Increase the requests.cpu for the linkerd-proxy container. For a moderately busy service, start with 200m.
      resources:
        requests:
          cpu: "200m" # Increased from 100m
          memory: "50Mi"
        limits:
          cpu: "400m"
          memory: "100Mi"
      
    • Why it works: A higher CPU request ensures the Kubernetes scheduler prioritizes this pod for CPU resources on the node. It signals to the scheduler that this container needs at least this much CPU to operate correctly, preventing it from being scheduled on an already saturated node.
  2. Under-provisioned CPU Limits for the Proxy:

    • Diagnosis: Similar to CPU requests, monitor CPU usage. If you see CPU usage hitting the limit (e.g., the limits.cpu is 200m and kubectl top pod shows sustained usage near 200m), the proxy will be throttled. This leads to increased latency and potentially dropped requests. Check kubectl describe pod <pod-name> for reason: ContainerCannotRun or reason: OOMKilled.
    • Fix: Increase the limits.cpu for the linkerd-proxy container. For a moderately busy service, 400m is a common starting point.
      resources:
        requests:
          cpu: "200m"
          memory: "50Mi"
        limits:
          cpu: "400m" # Increased from 200m
          memory: "100Mi"
      
    • Why it works: The CPU limit prevents the linkerd-proxy from consuming more than a specified amount of CPU. Increasing this limit allows the proxy to burst and handle transient traffic spikes without being throttled, ensuring better performance.
  3. Under-provisioned Memory Requests for the Proxy:

    • Diagnosis: Monitor the linkerd-proxy container’s memory usage with kubectl top pod <pod-name> -c linkerd-proxy. If memory usage is consistently high and close to the limits.memory, the pod is at risk of being evicted or the proxy might become unstable. Check kubectl describe pod <pod-name> for reason: OOMKilled.
    • Fix: Increase the requests.memory for the linkerd-proxy container. For a moderately busy service, 100Mi or 150Mi is often appropriate.
      resources:
        requests:
          cpu: "200m"
          memory: "100Mi" # Increased from 50Mi
        limits:
          cpu: "400m"
          memory: "200Mi"
      
    • Why it works: A higher memory request ensures the Kubernetes scheduler reserves enough memory for the proxy, preventing it from being placed on a node that might run out of memory. This reduces the chance of the pod being killed by the Kubelet when the node is under memory pressure.
  4. Under-provisioned Memory Limits for the Proxy:

    • Diagnosis: Observe the linkerd-proxy memory usage. If it hits the memory limit, the container will be terminated by the Kubelet with an OOMKilled error. This is a hard stop.
    • Fix: Increase the limits.memory for the linkerd-proxy container. A common starting point for moderate traffic is 200Mi.
      resources:
        requests:
          cpu: "200m"
          memory: "100Mi"
        limits:
          cpu: "400m"
          memory: "200Mi" # Increased from 100Mi
      
    • Why it works: The memory limit defines the maximum amount of memory the linkerd-proxy can consume. Increasing this limit provides headroom for the proxy to handle its internal data structures, connection pooling, and other memory-intensive operations without being killed.
  5. Application Container Resource Starvation:

    • Diagnosis: Sometimes, the problem isn’t the proxy itself, but the application container is starved. If your application container has low requests.cpu and requests.memory, it might not get enough resources, causing it to slow down or fail. The proxy then sees this slowness and might appear to be the bottleneck. Check kubectl top pod <pod-name> for the application container’s CPU/memory usage.
    • Fix: Increase the requests.cpu and requests.memory for your application container. The exact values depend heavily on your application.
      spec:
        containers:
        - name: my-app
          image: my-app:latest
          resources:
            requests:
              cpu: "500m" # Increased
              memory: "512Mi" # Increased
            limits:
              cpu: "1"
              memory: "1Gi"
      
    • Why it works: This ensures your application gets the resources it needs to run efficiently. A healthy application means the proxy has less work to do in terms of retries and error handling, indirectly improving overall perceived performance.
  6. Linkerd Control Plane Resource Issues:

    • Diagnosis: If all your Linkerd-enabled pods are showing resource pressure or latency, the issue might be with the Linkerd control plane itself (controller, web, etc.). Check the logs and resource usage of the linkerd-controller, linkerd-web, and linkerd-identity pods in the linkerd namespace.
    • Fix: Scale up the Linkerd control plane pods or increase their resource requests/limits. This is typically done by modifying the Linkerd installation configuration or the Helm values if installed via Helm.
      # Example: Scaling up the controller replica count
      kubectl scale deployment linkerd-controller -n linkerd --replicas=3
      
      Or, if using linkerd install with --set:
      linkerd install --set controller.replicas=3 | kubectl apply -f -
      
    • Why it works: The control plane is responsible for distributing routing information and managing the overall Linkerd mesh. If it’s overloaded, it can’t effectively serve the data plane proxies, leading to issues across the mesh.

After adjusting these resources, the next error you might encounter is related to the Linkerd policy controller if you haven’t configured it, or potentially certificate rotation issues if your linkerd-identity service is also under-resourced.

Want structured learning?

Take the full Linkerd course →