The Linkerd debug container is your go-to diagnostic tool when you’re scratching your head about why traffic isn’t flowing correctly through your service mesh.

Let’s see it in action. Imagine you have a frontend service that can’t reach its backend service. You’ve checked your Kubernetes NetworkPolicies, and they look fine. You’ve verified the frontend pod has the correct annotations for Linkerd. Yet, requests from frontend to backend are failing with a 503 Service Unavailable error.

First, you need to grab the debug container. It’s not enabled by default. You add it to your Linkerd installation’s values.yaml or pass it as a flag during helm install/upgrade:

controllerComponents:
  debugContainer:
    enabled: true

Then, you’ll need to restart your Linkerd controller pods for the change to take effect.

Once enabled, Linkerd automatically injects the debug container into your application pods. You can verify this by describing one of your application pods, say frontend-abcde-fghij:

kubectl describe pod frontend-abcde-fghij

You’ll see a second container listed, named linkerd-debug.

Now, let’s diagnose that frontend to backend connectivity issue. You’ll exec into the linkerd-debug container within the frontend pod:

kubectl exec -it frontend-abcde-fghij -c linkerd-debug -- bash

Inside the debug container, you have a curated set of tools. The first thing to check is if the frontend pod can even resolve the backend service name. Linkerd’s proxy operates on a specific port (usually 4140 for outgoing traffic), and DNS resolution is critical.

# Inside the debug container
nslookup backend.default.svc.cluster.local

If this fails, it’s not a Linkerd problem, but a fundamental Kubernetes networking issue. If it succeeds, you’ll see the ClusterIP for the backend service.

Next, let’s test direct connectivity to the backend service’s ClusterIP on its service port (e.g., 8080 for HTTP).

# Inside the debug container
nc -zv <backend-cluster-ip> 8080

If nc reports "succeeded!", it means TCP connectivity is established. If it fails, you’re looking at a CNI, kube-proxy, or Kubernetes NetworkPolicy issue.

Assuming nc succeeds, the problem likely lies within the Linkerd proxy itself. The debug container has curl and tcpdump to help. Let’s try to curl the backend service through the Linkerd proxy. The proxy intercepts outbound traffic, so you’ll target the pod’s IP address on the proxy’s outbound port (4140).

First, find the backend pod’s IP:

# Back in your host terminal
kubectl get pod -l app=backend -o wide

Let’s say the backend pod IP is 10.1.2.3. Now, back in the linkerd-debug container:

# Inside the debug container
curl -v http://10.1.2.3:8080/health

This curl command is important. It’s not going through the Linkerd proxy; it’s going directly to the backend pod’s IP. If this fails, the problem is likely on the backend pod itself (e.g., its application isn’t listening correctly, or a NetworkPolicy is blocking it on the backend pod’s side).

The crucial test is to curl the backend service’s service name on the proxy’s outbound port. This simulates what the frontend proxy should be doing.

# Inside the debug container
curl -v http://backend.default.svc.cluster.local:8080/health

This command is not what you want to do. The Linkerd proxy intercepts traffic destined for other services. You want to test if the Linkerd proxy can reach the backend service. The debug container provides curl and tcpdump which are useful.

The most insightful test is to use curl to hit the Linkerd proxy’s address for the backend service. The Linkerd proxy on the frontend pod will see this and try to route it.

# Inside the debug container
curl -v http://backend.default.svc.cluster.local:8080

This curl command is actually going to the Linkerd proxy on the frontend pod, which then attempts to route it to the backend service. If this returns a 503, the problem is almost certainly within the Linkerd proxy configuration or its ability to communicate with the backend proxy.

If this still fails, you can capture traffic from the debug container to see exactly what’s happening.

# Inside the debug container
tcpdump -i any port 8080 -w /tmp/backend.pcap

Then, from another terminal on your host, exec into the debug container again and run the curl command:

# In a new host terminal, exec into debug container
kubectl exec -it frontend-abcde-fghij -c linkerd-debug -- bash

# Inside the debug container
curl http://backend.default.svc.cluster.local:8080
exit

Now, copy the pcap file off the pod:

# Back on your host terminal
kubectl cp frontend-abcde-fghij:/tmp/backend.pcap ./backend.pcap -c linkerd-debug

You can then analyze backend.pcap with Wireshark. Look for SYN packets being sent to the backend pod’s IP on port 8080, and see if you get SYN-ACKs back. If you see SYN packets but no SYN-ACKs, the backend pod is not responding, or a firewall is blocking it. If you see no SYN packets at all, the Linkerd proxy on the frontend pod isn’t even attempting to send the traffic.

Common causes for 503 errors in Linkerd:

  1. Backend Pod Not Ready/Running: The most frequent culprit. The backend service might have a ClusterIP, but no healthy pods backing it.

    • Diagnosis: kubectl get pods -l app=backend -o wide and kubectl describe pod <backend-pod-name>. Look for Running status and no failing readiness probes.
    • Fix: Resolve issues with the backend application (e.g., fix application errors, increase resource limits, correct readiness probe).
    • Why it works: Linkerd only routes traffic to healthy pods. If there are no healthy pods, it returns a 503.
  2. Incorrect controlPlane.identity.trustDomain: If your cluster spans multiple trust domains, or if the trust domain is misconfigured, proxies won’t be able to establish mTLS connections.

    • Diagnosis: Check helm list -n linkerd -o yaml for linkerd.controlPlane.identity.trustDomain. Ensure it matches across all Linkerd installations if you have multiple.
    • Fix: helm upgrade linkerd linkerd/linkerd2 --namespace linkerd --set controlPlane.identity.trustDomain=your-domain.link (replace your-domain.link with your actual trust domain). Restart Linkerd controllers.
    • Why it works: mTLS is fundamental to Linkerd’s operation. Proxies use the trust domain to verify each other’s identities.
  3. Network Policy Blocking Proxy-to-Proxy Communication: Kubernetes NetworkPolicies might be too restrictive, preventing Linkerd proxies from communicating with each other.

    • Diagnosis: kubectl get networkpolicy -n <your-namespace> and examine policies affecting your frontend and backend pods.
    • Fix: Add or modify NetworkPolicies to allow ingress to the backend pods on port 8080 (or your application’s port) from the linkerd-proxy sidecar’s namespace (linkerd by default) or from the frontend pod’s namespace if you’re not using linkerd-proxy namespace selectors. For example, allow ingress from app=linkerd-proxy on the linkerd namespace.
    • Why it works: Linkerd proxies need to establish connections to other proxies. NetworkPolicies can inadvertently block this.
  4. Incorrect proxy.outboundPorts or proxy.inboundPorts: If you’ve customized Linkerd’s port configurations, you might have excluded the port your backend service is listening on from the outbound proxy’s processing.

    • Diagnosis: Check your application pod’s linkerd.io/inject: enabled annotation or the proxy.proxySpec.outboundPorts field in the linkerd-proxy container definition.
    • Fix: Ensure the backend service port (e.g., 8080) is listed in proxy.outboundPorts (or not explicitly excluded) in your Linkerd values.yaml or via pod annotations. helm upgrade linkerd linkerd/linkerd2 --namespace linkerd --set proxy.outboundPorts='{8080,4140,other-ports}'.
    • Why it works: The Linkerd proxy needs to be configured to intercept and handle traffic on the ports your services use.
  5. Linkerd Proxy CrashLoopBackOff: The Linkerd proxy sidecar itself might be failing.

    • Diagnosis: kubectl logs <frontend-pod-name> -c linkerd-proxy and kubectl describe pod <frontend-pod-name>. Look for errors in the proxy logs or high CPU/memory.
    • Fix: Increase CPU/memory limits for the linkerd-proxy container in your Linkerd installation’s values.yaml or via pod annotations. helm upgrade linkerd linkerd/linkerd2 --namespace linkerd --set proxy.resources.requests.cpu=100m --set proxy.resources.requests.memory=128Mi.
    • Why it works: Insufficient resources can cause the proxy to crash, leading to traffic disruptions.
  6. Application Binding to Incorrect Interface: The backend application might be binding to localhost instead of 0.0.0.0 or the pod’s IP.

    • Diagnosis: kubectl exec -it <backend-pod-name> -- netstat -tulnp. Verify the application port is listening on 0.0.0.0:<port>.
    • Fix: Reconfigure your backend application to listen on 0.0.0.0 or the pod’s IP address.
    • Why it works: When Linkerd routes traffic to the backend pod, it arrives on the pod’s IP. If the application only listens on localhost, it won’t receive the traffic.

After fixing these, the next error you’ll likely encounter is related to TLS handshake errors if mTLS is misconfigured or if you haven’t fully rolled out certificates.

Want structured learning?

Take the full Linkerd course →