The Kubernetes API server is refusing to schedule new pods because the container runtime on node worker-01 has stopped responding to health checks.

The most common culprit is the container runtime itself crashing or becoming unresponsive. This can happen due to resource exhaustion on the node, configuration issues, or bugs within the runtime.

Cause 1: Container Runtime Service Not Running

  • Diagnosis: SSH into the affected node (worker-01) and check the status of the container runtime service. For containerd, this is sudo systemctl status containerd. For Docker, it’s sudo systemctl status docker.
  • Fix: If the service is not active, start it: sudo systemctl start containerd or sudo systemctl start docker. Then enable it to start on boot: sudo systemctl enable containerd or sudo systemctl enable docker.
  • Why it works: The container runtime is the core component responsible for pulling container images and running containers. If its service isn’t running, Kubernetes cannot launch any workloads.

Cause 2: Resource Exhaustion on the Node

  • Diagnosis: On worker-01, check memory and CPU usage. Use free -h for memory and top or htop for CPU. Look for processes consuming excessive resources, especially the container runtime itself or applications running within containers.
  • Fix: If resource exhaustion is identified, you need to either reduce the resource consumption of existing pods (by setting appropriate resource requests and limits) or scale up the node’s resources (add more RAM or CPU). If a specific pod is the culprit, you might need to stop it temporarily: kubectl delete pod <pod-name> -n <namespace>.
  • Why it works: When a node runs out of memory or CPU, the Linux Out-Of-Memory (OOM) killer might terminate critical processes, including the container runtime, to free up resources.

Cause 3: Corrupted Container Runtime State

  • Diagnosis: Check the container runtime’s logs for errors related to corrupted images, volumes, or network configurations. For containerd, logs are typically at /var/log/containerd/containerd.log. For Docker, /var/log/docker.log. Look for messages like "failed to find image" or "container with id … not found."
  • Fix: A common fix is to prune unused data. For containerd: sudo ctr --namespace k8s.io content rm --all followed by sudo ctr --namespace k8s.io image rm --all and sudo ctr --namespace k8s.io delete --all. For Docker: sudo docker system prune -a. After pruning, restart the runtime service.
  • Why it works: Corrupted or orphaned data can prevent the runtime from initializing correctly or accessing necessary components, leading to failures. Pruning removes this problematic data.

Cause 4: Network Issues Preventing Image Pulls

  • Diagnosis: From worker-01, try to manually pull a container image from the registry Kubernetes uses. For example, sudo crictl pull docker.io/library/nginx:latest (if using containerd) or sudo docker pull nginx:latest (if using Docker). Check DNS resolution and firewall rules.
  • Fix: Ensure the node has proper network connectivity to the container registry. Verify DNS is resolving correctly (nslookup registry.example.com) and that no firewalls are blocking outbound traffic on ports 443 (HTTPS) or potentially others required by the registry.
  • Why it works: The container runtime needs to pull images from a registry. If it can’t reach the registry due to network misconfiguration or firewall rules, it can lead to startup failures for new pods.

Cause 5: Outdated or Incompatible Container Runtime Version

  • Diagnosis: Check the installed version of containerd (sudo containerd --version) or Docker (sudo docker version) on worker-01. Compare this to the Kubernetes version you are running and the documented compatibility matrix for your Kubernetes distribution.
  • Fix: Upgrade or downgrade the container runtime to a version known to be compatible with your Kubernetes version. Follow the specific upgrade/downgrade procedures for your OS and runtime. For example, on Ubuntu/Debian with containerd: sudo apt update && sudo apt install containerd.io=<specific_version>.
  • Why it works: Kubernetes and its container runtimes are tightly coupled. Using a version of the runtime that is too new or too old can lead to API incompatibilities and failures.

Cause 6: Runtime Configuration Errors

  • Diagnosis: Inspect the runtime’s configuration file. For containerd, this is typically /etc/containerd/config.toml. For Docker, /etc/docker/daemon.json. Look for incorrect or malformed settings, especially around storage drivers, network configurations, or plugin settings.
  • Fix: Correct any invalid configuration parameters. For example, if a storage.options field in containerd’s config is incorrect, adjust it to a valid value. After making changes, restart the runtime service.
  • Why it works: A misconfigured runtime cannot start or operate correctly, leading to its unresponsiveness and inability to manage containers.

After resolving these, the next error you’ll likely encounter is a ImagePullBackOff or ErrImagePull for pods that were previously scheduled but couldn’t start due to the runtime issue, as they might have been marked for retry.

Want structured learning?

Take the full Kubernetes course →