kubectl debug is your panic button for when a running pod has decided to go rogue.

The core issue kubectl debug tackles is that traditional debugging methods for running applications often involve modifying the running container (installing tools, changing configurations), which is a big no-no in Kubernetes. You can’t just SSH into a pod’s container and start installing strace or tcpdump if the container image is minimal or immutable. kubectl debug sidesteps this by creating a copy of the pod with debugging tools attached, leaving your original, potentially fragile, application container untouched.

Common Causes and Fixes for Pod Issues

Let’s assume you’re seeing a pod stuck in CrashLoopBackOff or not responding to network requests.

  1. Application Crashed on Startup/Runtime: The most frequent culprit. Your application process exited because of an unhandled exception, configuration error, or resource exhaustion.

    • Diagnosis: kubectl logs <pod-name> -c <container-name> will show the application’s stdout/stderr. If the container is already gone, you might not get logs. kubectl describe pod <pod-name> is your next stop, looking at the Last State and Reason fields for the container.
    • Fix (if logs are available): Analyze logs for stack traces or error messages. Common fixes involve correcting configuration files (e.g., kubectl cp <local-config> <pod-name>:/path/to/config), environment variables (kubectl exec <pod-name> -- env), or application code. Restarting the pod might be necessary after a fix: kubectl delete pod <pod-name>.
    • Why it works: This directly addresses the application’s failure by providing its own diagnostic output.
  2. Configuration File Errors: Your application relies on a config file that’s missing, malformed, or has incorrect values.

    • Diagnosis: If the pod is running but misbehaving, kubectl exec <pod-name> -c <container-name> -- cat /path/to/config/file can reveal its contents. Look for syntax errors, missing keys, or invalid values.
    • Fix: Use kubectl cp <local-config-file> <pod-name>:/path/to/config/file to replace the incorrect file. If the file is managed by a ConfigMap, update the ConfigMap and then delete the pod to force a recreation with the new config.
    • Why it works: Ensures the application reads the correct instructions for operation.
  3. Resource Limits Too Low (OOMKilled): The pod consumed more memory than its resources.limits.memory allowed, leading Kubernetes to terminate the process (Out-Of-Memory Kill).

    • Diagnosis: kubectl describe pod <pod-name> will show OOMKilled in the Last State or State of the container.
    • Fix: Increase the memory limit in the pod’s spec. For example, change resources.limits.memory: "256Mi" to resources.limits.memory: "512Mi". Then, delete and recreate the pod.
    • Why it works: Gives the application more headroom before the kernel’s OOM killer intervenes.
  4. Network Policy Blocking Traffic: A NetworkPolicy is preventing the pod from receiving or sending traffic it needs.

    • Diagnosis: If a pod can’t reach other services or isn’t reachable, check kubectl get networkpolicy -n <namespace>. Look for policies that might be too restrictive. You can test connectivity using kubectl exec <pod-name> -- curl <service-ip>:<port> or kubectl exec <pod-name> -- ping <other-pod-ip>.
    • Fix: Modify the NetworkPolicy to allow the necessary ingress or egress. For instance, if your app needs to talk to a database on port 5432, ensure your policy’s egress section includes a rule for ports: [{ protocol: TCP, port: 5432 }] and the correct to selectors.
    • Why it works: Explicitly permits the communication flow that was previously blocked.
  5. Container Image Issues (Corrupted, Wrong Tag): The container image itself might be faulty, or you might be pulling the wrong version.

    • Diagnosis: Check kubectl describe pod <pod-name> for image pull errors or image ID mismatches. kubectl describe node <node-name> might show disk pressure if images are filling up storage.
    • Fix: Ensure the image: field in your pod spec points to the correct, existing image tag. If the image is custom, rebuild it and push it to the registry. Delete the pod to force a re-pull: kubectl delete pod <pod-name>.
    • Why it works: Guarantees that a known-good, correct version of the application code and dependencies is loaded.
  6. Readiness/Liveness Probe Failures: The application starts but doesn’t become ready or crashes after starting because its probes are failing.

    • Diagnosis: kubectl describe pod <pod-name> will show Readiness probe failed or Liveness probe failed in the Events section. The pod might be cycling through ContainerCreating -> Running -> CrashLoopBackOff.
    • Fix: Adjust the probe settings (e.g., initialDelaySeconds, periodSeconds, timeoutSeconds) in the pod spec to give your application more time to start or respond. If the probe is a command, kubectl exec <pod-name> -- <probe-command> can help diagnose why it’s failing.
    • Why it works: Allows more time for the application to initialize or ensures the probe accurately reflects the application’s health.
  7. Volume Mounting Problems: A persistent volume claim (PVC) is not bound, the underlying storage is unavailable, or the mount path is incorrect.

    • Diagnosis: kubectl describe pod <pod-name> will show events related to volume mounting failures. Check kubectl get pvc -n <namespace> to ensure the PVC is Bound. kubectl get pv can show the status of the persistent volume.
    • Fix: Ensure the PVC name and namespace are correct. If the storage class is wrong, correct it in the PVC. If the underlying storage is down, that’s an infrastructure issue. Delete and recreate the pod to retry mounting.
    • Why it works: Ensures critical application data can be accessed and persisted.

The next error you’ll likely hit after fixing these is a CrashLoopBackOff on a different pod, indicating a systemic configuration issue or a dependency that’s now failing due to the first pod’s fix.


Kubernetes kubectl debug Explained

The most surprising true thing about kubectl debug is that it doesn’t fix your running pod; it creates a new, modified copy of it to help you diagnose the original.

Let’s say you have a pod named my-app-7b9d4f8c9-xyz12 that’s exhibiting strange behavior – maybe it’s not processing messages, or its logs are empty when you expect them to be full. You suspect an issue with installed tools or library versions inside the container.

Here’s how kubectl debug can help. Instead of trying to kubectl exec into the pod and install strace (which might fail if the image is minimal or you lack permissions), you can create a debug version:

kubectl debug my-app-7b9d4f8c9-xyz12 -it --copy-to=my-app-debug --container=my-app --image=ubuntu:22.04 --target=my-app-7b9d4f8c9-xyz12

Let’s break this down:

  • kubectl debug my-app-7b9d4f8c9-xyz12: This is the command, targeting your problematic pod.
  • -it: This makes the session interactive and allocates a TTY, so you can type commands.
  • --copy-to=my-app-debug: This tells kubectl debug to create a new pod named my-app-debug. Your original pod remains untouched.
  • --container=my-app: This specifies which container within the original pod you want to debug. If your pod has only one container, you can omit this.
  • --image=ubuntu:22.04: This is the crucial part. You’re telling kubectl debug to use a completely different image for your new debug pod. This image, ubuntu:22.04 in this case, is assumed to have all the debugging tools you might need (like strace, tcpdump, ps, netstat, etc.). You could also use an image like nicolaka/netshoot which is purpose-built for network debugging.
  • --target=my-app-7b9d4f8c9-xyz12: This explicitly states that the new pod should share the network namespace and process namespace of the original pod. This is what allows you to "see" the original pod’s processes and network traffic as if you were inside it.

Once you run this command, you’ll be dropped into a shell within the ubuntu:22.04 container. Because of the --target flag, you’ll see the processes of your original my-app container running.

# Inside the debug pod's shell
root@my-app-debug:/# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1  12345  6789 pts/0    Ss+  10:00   0:00 /pause
appuser      8  0.5  2.3  98765 45678 pts/0    Sl+  10:01   0:15 /app/my-app-binary
root        15  0.0  0.0  11111  2222 pts/1    Ss   10:02   0:00 bash
root        22  0.0  0.0  10000  3000 pts/1    R+   10:03   0:00 ps aux

You can now use any tools available in the ubuntu:22.04 image to inspect your my-app process.

  • Inspect processes: ps aux, top
  • Trace system calls: strace -p 8 (where 8 is the PID of your my-app binary)
  • Capture network traffic: tcpdump -i eth0
  • Check file descriptors: lsof -p 8

The key here is that the original my-app container is left untouched. It continues to run (or crash, as it was doing) while you perform your investigation in the isolated debug container. Once you’re done, you can delete the debug pod: kubectl delete pod my-app-debug.

The real power of --target is its ability to share namespaces. By default, kubectl debug creates a pod with its own network and process namespaces. When you specify --target, you’re essentially saying, "Make this new container a 'sidecar' that can see into the original pod’s world." This is invaluable for debugging runtime issues that are hard to reproduce outside the live environment.

A common pattern is to use a debug image that contains a comprehensive set of tools, like nicolaka/netshoot or a custom-built image with your specific debugging needs. The --copy-to flag ensures you don’t accidentally delete or modify the original pod, providing a safe sandbox for exploration.

The next concept you’ll want to explore is how to use kubectl debug to create ephemeral debug containers within an existing running pod, rather than creating an entirely new, copied pod.

Want structured learning?

Take the full Kubernetes course →