The Linkerd debug container is your go-to diagnostic tool when you’re scratching your head about why traffic isn’t flowing correctly through your service mesh.
Let’s see it in action. Imagine you have a frontend service that can’t reach its backend service. You’ve checked your Kubernetes NetworkPolicies, and they look fine. You’ve verified the frontend pod has the correct annotations for Linkerd. Yet, requests from frontend to backend are failing with a 503 Service Unavailable error.
First, you need to grab the debug container. It’s not enabled by default. You add it to your Linkerd installation’s values.yaml or pass it as a flag during helm install/upgrade:
controllerComponents:
debugContainer:
enabled: true
Then, you’ll need to restart your Linkerd controller pods for the change to take effect.
Once enabled, Linkerd automatically injects the debug container into your application pods. You can verify this by describing one of your application pods, say frontend-abcde-fghij:
kubectl describe pod frontend-abcde-fghij
You’ll see a second container listed, named linkerd-debug.
Now, let’s diagnose that frontend to backend connectivity issue. You’ll exec into the linkerd-debug container within the frontend pod:
kubectl exec -it frontend-abcde-fghij -c linkerd-debug -- bash
Inside the debug container, you have a curated set of tools. The first thing to check is if the frontend pod can even resolve the backend service name. Linkerd’s proxy operates on a specific port (usually 4140 for outgoing traffic), and DNS resolution is critical.
# Inside the debug container
nslookup backend.default.svc.cluster.local
If this fails, it’s not a Linkerd problem, but a fundamental Kubernetes networking issue. If it succeeds, you’ll see the ClusterIP for the backend service.
Next, let’s test direct connectivity to the backend service’s ClusterIP on its service port (e.g., 8080 for HTTP).
# Inside the debug container
nc -zv <backend-cluster-ip> 8080
If nc reports "succeeded!", it means TCP connectivity is established. If it fails, you’re looking at a CNI, kube-proxy, or Kubernetes NetworkPolicy issue.
Assuming nc succeeds, the problem likely lies within the Linkerd proxy itself. The debug container has curl and tcpdump to help. Let’s try to curl the backend service through the Linkerd proxy. The proxy intercepts outbound traffic, so you’ll target the pod’s IP address on the proxy’s outbound port (4140).
First, find the backend pod’s IP:
# Back in your host terminal
kubectl get pod -l app=backend -o wide
Let’s say the backend pod IP is 10.1.2.3. Now, back in the linkerd-debug container:
# Inside the debug container
curl -v http://10.1.2.3:8080/health
This curl command is important. It’s not going through the Linkerd proxy; it’s going directly to the backend pod’s IP. If this fails, the problem is likely on the backend pod itself (e.g., its application isn’t listening correctly, or a NetworkPolicy is blocking it on the backend pod’s side).
The crucial test is to curl the backend service’s service name on the proxy’s outbound port. This simulates what the frontend proxy should be doing.
# Inside the debug container
curl -v http://backend.default.svc.cluster.local:8080/health
This command is not what you want to do. The Linkerd proxy intercepts traffic destined for other services. You want to test if the Linkerd proxy can reach the backend service. The debug container provides curl and tcpdump which are useful.
The most insightful test is to use curl to hit the Linkerd proxy’s address for the backend service. The Linkerd proxy on the frontend pod will see this and try to route it.
# Inside the debug container
curl -v http://backend.default.svc.cluster.local:8080
This curl command is actually going to the Linkerd proxy on the frontend pod, which then attempts to route it to the backend service. If this returns a 503, the problem is almost certainly within the Linkerd proxy configuration or its ability to communicate with the backend proxy.
If this still fails, you can capture traffic from the debug container to see exactly what’s happening.
# Inside the debug container
tcpdump -i any port 8080 -w /tmp/backend.pcap
Then, from another terminal on your host, exec into the debug container again and run the curl command:
# In a new host terminal, exec into debug container
kubectl exec -it frontend-abcde-fghij -c linkerd-debug -- bash
# Inside the debug container
curl http://backend.default.svc.cluster.local:8080
exit
Now, copy the pcap file off the pod:
# Back on your host terminal
kubectl cp frontend-abcde-fghij:/tmp/backend.pcap ./backend.pcap -c linkerd-debug
You can then analyze backend.pcap with Wireshark. Look for SYN packets being sent to the backend pod’s IP on port 8080, and see if you get SYN-ACKs back. If you see SYN packets but no SYN-ACKs, the backend pod is not responding, or a firewall is blocking it. If you see no SYN packets at all, the Linkerd proxy on the frontend pod isn’t even attempting to send the traffic.
Common causes for 503 errors in Linkerd:
-
Backend Pod Not Ready/Running: The most frequent culprit. The
backendservice might have a ClusterIP, but no healthy pods backing it.- Diagnosis:
kubectl get pods -l app=backend -o wideandkubectl describe pod <backend-pod-name>. Look forRunningstatus and no failing readiness probes. - Fix: Resolve issues with the
backendapplication (e.g., fix application errors, increase resource limits, correct readiness probe). - Why it works: Linkerd only routes traffic to healthy pods. If there are no healthy pods, it returns a
503.
- Diagnosis:
-
Incorrect
controlPlane.identity.trustDomain: If your cluster spans multiple trust domains, or if the trust domain is misconfigured, proxies won’t be able to establish mTLS connections.- Diagnosis: Check
helm list -n linkerd -o yamlforlinkerd.controlPlane.identity.trustDomain. Ensure it matches across all Linkerd installations if you have multiple. - Fix:
helm upgrade linkerd linkerd/linkerd2 --namespace linkerd --set controlPlane.identity.trustDomain=your-domain.link(replaceyour-domain.linkwith your actual trust domain). Restart Linkerd controllers. - Why it works: mTLS is fundamental to Linkerd’s operation. Proxies use the trust domain to verify each other’s identities.
- Diagnosis: Check
-
Network Policy Blocking Proxy-to-Proxy Communication: Kubernetes NetworkPolicies might be too restrictive, preventing Linkerd proxies from communicating with each other.
- Diagnosis:
kubectl get networkpolicy -n <your-namespace>and examine policies affecting yourfrontendandbackendpods. - Fix: Add or modify NetworkPolicies to allow ingress to the
backendpods on port 8080 (or your application’s port) from thelinkerd-proxysidecar’s namespace (linkerdby default) or from thefrontendpod’s namespace if you’re not usinglinkerd-proxynamespace selectors. For example, allow ingress fromapp=linkerd-proxyon thelinkerdnamespace. - Why it works: Linkerd proxies need to establish connections to other proxies. NetworkPolicies can inadvertently block this.
- Diagnosis:
-
Incorrect
proxy.outboundPortsorproxy.inboundPorts: If you’ve customized Linkerd’s port configurations, you might have excluded the port yourbackendservice is listening on from the outbound proxy’s processing.- Diagnosis: Check your application pod’s
linkerd.io/inject: enabledannotation or theproxy.proxySpec.outboundPortsfield in thelinkerd-proxycontainer definition. - Fix: Ensure the
backendservice port (e.g., 8080) is listed inproxy.outboundPorts(or not explicitly excluded) in your Linkerdvalues.yamlor via pod annotations.helm upgrade linkerd linkerd/linkerd2 --namespace linkerd --set proxy.outboundPorts='{8080,4140,other-ports}'. - Why it works: The Linkerd proxy needs to be configured to intercept and handle traffic on the ports your services use.
- Diagnosis: Check your application pod’s
-
Linkerd Proxy CrashLoopBackOff: The Linkerd proxy sidecar itself might be failing.
- Diagnosis:
kubectl logs <frontend-pod-name> -c linkerd-proxyandkubectl describe pod <frontend-pod-name>. Look for errors in the proxy logs or high CPU/memory. - Fix: Increase CPU/memory limits for the
linkerd-proxycontainer in your Linkerd installation’svalues.yamlor via pod annotations.helm upgrade linkerd linkerd/linkerd2 --namespace linkerd --set proxy.resources.requests.cpu=100m --set proxy.resources.requests.memory=128Mi. - Why it works: Insufficient resources can cause the proxy to crash, leading to traffic disruptions.
- Diagnosis:
-
Application Binding to Incorrect Interface: The backend application might be binding to
localhostinstead of0.0.0.0or the pod’s IP.- Diagnosis:
kubectl exec -it <backend-pod-name> -- netstat -tulnp. Verify the application port is listening on0.0.0.0:<port>. - Fix: Reconfigure your backend application to listen on
0.0.0.0or the pod’s IP address. - Why it works: When Linkerd routes traffic to the backend pod, it arrives on the pod’s IP. If the application only listens on
localhost, it won’t receive the traffic.
- Diagnosis:
After fixing these, the next error you’ll likely encounter is related to TLS handshake errors if mTLS is misconfigured or if you haven’t fully rolled out certificates.