kubectl debug is your panic button for when a running pod has decided to go rogue.
The core issue kubectl debug tackles is that traditional debugging methods for running applications often involve modifying the running container (installing tools, changing configurations), which is a big no-no in Kubernetes. You can’t just SSH into a pod’s container and start installing strace or tcpdump if the container image is minimal or immutable. kubectl debug sidesteps this by creating a copy of the pod with debugging tools attached, leaving your original, potentially fragile, application container untouched.
Common Causes and Fixes for Pod Issues
Let’s assume you’re seeing a pod stuck in CrashLoopBackOff or not responding to network requests.
-
Application Crashed on Startup/Runtime: The most frequent culprit. Your application process exited because of an unhandled exception, configuration error, or resource exhaustion.
- Diagnosis:
kubectl logs <pod-name> -c <container-name>will show the application’s stdout/stderr. If the container is already gone, you might not get logs.kubectl describe pod <pod-name>is your next stop, looking at theLast StateandReasonfields for the container. - Fix (if logs are available): Analyze logs for stack traces or error messages. Common fixes involve correcting configuration files (e.g.,
kubectl cp <local-config> <pod-name>:/path/to/config), environment variables (kubectl exec <pod-name> -- env), or application code. Restarting the pod might be necessary after a fix:kubectl delete pod <pod-name>. - Why it works: This directly addresses the application’s failure by providing its own diagnostic output.
- Diagnosis:
-
Configuration File Errors: Your application relies on a config file that’s missing, malformed, or has incorrect values.
- Diagnosis: If the pod is running but misbehaving,
kubectl exec <pod-name> -c <container-name> -- cat /path/to/config/filecan reveal its contents. Look for syntax errors, missing keys, or invalid values. - Fix: Use
kubectl cp <local-config-file> <pod-name>:/path/to/config/fileto replace the incorrect file. If the file is managed by a ConfigMap, update the ConfigMap and then delete the pod to force a recreation with the new config. - Why it works: Ensures the application reads the correct instructions for operation.
- Diagnosis: If the pod is running but misbehaving,
-
Resource Limits Too Low (OOMKilled): The pod consumed more memory than its
resources.limits.memoryallowed, leading Kubernetes to terminate the process (Out-Of-Memory Kill).- Diagnosis:
kubectl describe pod <pod-name>will showOOMKilledin theLast StateorStateof the container. - Fix: Increase the memory limit in the pod’s spec. For example, change
resources.limits.memory: "256Mi"toresources.limits.memory: "512Mi". Then, delete and recreate the pod. - Why it works: Gives the application more headroom before the kernel’s OOM killer intervenes.
- Diagnosis:
-
Network Policy Blocking Traffic: A
NetworkPolicyis preventing the pod from receiving or sending traffic it needs.- Diagnosis: If a pod can’t reach other services or isn’t reachable, check
kubectl get networkpolicy -n <namespace>. Look for policies that might be too restrictive. You can test connectivity usingkubectl exec <pod-name> -- curl <service-ip>:<port>orkubectl exec <pod-name> -- ping <other-pod-ip>. - Fix: Modify the
NetworkPolicyto allow the necessary ingress or egress. For instance, if your app needs to talk to a database on port 5432, ensure your policy’segresssection includes a rule forports: [{ protocol: TCP, port: 5432 }]and the correcttoselectors. - Why it works: Explicitly permits the communication flow that was previously blocked.
- Diagnosis: If a pod can’t reach other services or isn’t reachable, check
-
Container Image Issues (Corrupted, Wrong Tag): The container image itself might be faulty, or you might be pulling the wrong version.
- Diagnosis: Check
kubectl describe pod <pod-name>for image pull errors or image ID mismatches.kubectl describe node <node-name>might show disk pressure if images are filling up storage. - Fix: Ensure the
image:field in your pod spec points to the correct, existing image tag. If the image is custom, rebuild it and push it to the registry. Delete the pod to force a re-pull:kubectl delete pod <pod-name>. - Why it works: Guarantees that a known-good, correct version of the application code and dependencies is loaded.
- Diagnosis: Check
-
Readiness/Liveness Probe Failures: The application starts but doesn’t become ready or crashes after starting because its probes are failing.
- Diagnosis:
kubectl describe pod <pod-name>will showReadiness probe failedorLiveness probe failedin theEventssection. The pod might be cycling throughContainerCreating->Running->CrashLoopBackOff. - Fix: Adjust the probe settings (e.g.,
initialDelaySeconds,periodSeconds,timeoutSeconds) in the pod spec to give your application more time to start or respond. If the probe is a command,kubectl exec <pod-name> -- <probe-command>can help diagnose why it’s failing. - Why it works: Allows more time for the application to initialize or ensures the probe accurately reflects the application’s health.
- Diagnosis:
-
Volume Mounting Problems: A persistent volume claim (PVC) is not bound, the underlying storage is unavailable, or the mount path is incorrect.
- Diagnosis:
kubectl describe pod <pod-name>will show events related to volume mounting failures. Checkkubectl get pvc -n <namespace>to ensure the PVC isBound.kubectl get pvcan show the status of the persistent volume. - Fix: Ensure the PVC name and namespace are correct. If the storage class is wrong, correct it in the PVC. If the underlying storage is down, that’s an infrastructure issue. Delete and recreate the pod to retry mounting.
- Why it works: Ensures critical application data can be accessed and persisted.
- Diagnosis:
The next error you’ll likely hit after fixing these is a CrashLoopBackOff on a different pod, indicating a systemic configuration issue or a dependency that’s now failing due to the first pod’s fix.
Kubernetes kubectl debug Explained
The most surprising true thing about kubectl debug is that it doesn’t fix your running pod; it creates a new, modified copy of it to help you diagnose the original.
Let’s say you have a pod named my-app-7b9d4f8c9-xyz12 that’s exhibiting strange behavior – maybe it’s not processing messages, or its logs are empty when you expect them to be full. You suspect an issue with installed tools or library versions inside the container.
Here’s how kubectl debug can help. Instead of trying to kubectl exec into the pod and install strace (which might fail if the image is minimal or you lack permissions), you can create a debug version:
kubectl debug my-app-7b9d4f8c9-xyz12 -it --copy-to=my-app-debug --container=my-app --image=ubuntu:22.04 --target=my-app-7b9d4f8c9-xyz12
Let’s break this down:
kubectl debug my-app-7b9d4f8c9-xyz12: This is the command, targeting your problematic pod.-it: This makes the session interactive and allocates a TTY, so you can type commands.--copy-to=my-app-debug: This tellskubectl debugto create a new pod namedmy-app-debug. Your original pod remains untouched.--container=my-app: This specifies which container within the original pod you want to debug. If your pod has only one container, you can omit this.--image=ubuntu:22.04: This is the crucial part. You’re tellingkubectl debugto use a completely different image for your new debug pod. This image,ubuntu:22.04in this case, is assumed to have all the debugging tools you might need (likestrace,tcpdump,ps,netstat, etc.). You could also use an image likenicolaka/netshootwhich is purpose-built for network debugging.--target=my-app-7b9d4f8c9-xyz12: This explicitly states that the new pod should share the network namespace and process namespace of the original pod. This is what allows you to "see" the original pod’s processes and network traffic as if you were inside it.
Once you run this command, you’ll be dropped into a shell within the ubuntu:22.04 container. Because of the --target flag, you’ll see the processes of your original my-app container running.
# Inside the debug pod's shell
root@my-app-debug:/# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.1 12345 6789 pts/0 Ss+ 10:00 0:00 /pause
appuser 8 0.5 2.3 98765 45678 pts/0 Sl+ 10:01 0:15 /app/my-app-binary
root 15 0.0 0.0 11111 2222 pts/1 Ss 10:02 0:00 bash
root 22 0.0 0.0 10000 3000 pts/1 R+ 10:03 0:00 ps aux
You can now use any tools available in the ubuntu:22.04 image to inspect your my-app process.
- Inspect processes:
ps aux,top - Trace system calls:
strace -p 8(where8is the PID of yourmy-appbinary) - Capture network traffic:
tcpdump -i eth0 - Check file descriptors:
lsof -p 8
The key here is that the original my-app container is left untouched. It continues to run (or crash, as it was doing) while you perform your investigation in the isolated debug container. Once you’re done, you can delete the debug pod: kubectl delete pod my-app-debug.
The real power of --target is its ability to share namespaces. By default, kubectl debug creates a pod with its own network and process namespaces. When you specify --target, you’re essentially saying, "Make this new container a 'sidecar' that can see into the original pod’s world." This is invaluable for debugging runtime issues that are hard to reproduce outside the live environment.
A common pattern is to use a debug image that contains a comprehensive set of tools, like nicolaka/netshoot or a custom-built image with your specific debugging needs. The --copy-to flag ensures you don’t accidentally delete or modify the original pod, providing a safe sandbox for exploration.
The next concept you’ll want to explore is how to use kubectl debug to create ephemeral debug containers within an existing running pod, rather than creating an entirely new, copied pod.