Your pods are getting kicked off your Kubernetes nodes because the nodes are running out of memory. This happens when the kernel’s Out-Of-Memory (OOM) killer steps in and terminates processes to free up memory, and in Kubernetes, it usually targets your pods.

Common Causes and Fixes

  1. Under-provisioned Nodes: The most frequent culprit is simply not having enough RAM on your worker nodes to handle the workload.

    • Diagnosis: Check your node utilization.
      kubectl top nodes
      
      Look for nodes consistently showing high memory usage (e.g., 90%+) across kubectl top pods --all-namespaces output for pods on that node.
    • Fix: Increase the memory available to your nodes. This can mean:
      • Adding more RAM to existing nodes (if hardware allows).
      • Replacing nodes with larger instances.
      • Adding more nodes to the cluster to distribute the load.
    • Why it works: More physical or virtual memory on the node means the OOM killer has more breathing room before it needs to act.
  2. Memory Leaks in Pods: A single pod might be consuming more and more memory over time without releasing it, eventually exhausting node resources.

    • Diagnosis:
      • Identify pods with consistently increasing memory usage over extended periods.
      kubectl top pods --all-namespaces --sort-by=memory
      
      • Use tools like kubectl exec <pod-name> -- pmap -x <pid> (where <pid> is the process ID inside the pod) or application-specific profiling tools to find the leak.
    • Fix: Address the memory leak in the application code. This is highly application-specific but generally involves fixing incorrect memory allocation/deallocation patterns or resource handling.
    • Why it works: Fixing the leak prevents the pod from continuously gobbling up memory, thus reducing the overall memory pressure on the node.
  3. Insufficient Pod Memory Limits/Requests: Pods are allowed to consume more memory than they are assigned via limits.memory, leading to unexpected exhaustion.

    • Diagnosis: Compare pod requests.memory and limits.memory against actual usage from kubectl top pods. If limits.memory is set too high or not at all, a pod can consume all available node memory.
      kubectl get pod <pod-name> -o yaml
      
      Look at spec.containers[].resources.requests.memory and spec.containers[].resources.limits.memory.
    • Fix: Set appropriate limits.memory for your pods. A good starting point is to set limits.memory equal to requests.memory if your application is stable, or slightly higher (e.g., 1.5x-2x) if you know it has spiky usage.
      resources:
        requests:
          memory: "256Mi"
        limits:
          memory: "512Mi"
      
    • Why it works: Kubernetes uses limits.memory to reserve resources and to evict pods when they exceed their allocated limit, preventing them from starving other pods or the node itself.
  4. Kubelet Memory Usage: The Kubelet itself, responsible for managing pods on a node, can consume significant memory, especially on nodes with many pods or custom configurations.

    • Diagnosis: Check Kubelet’s memory usage directly on the node (e.g., via top or htop on the node).
    • Fix:
      • Ensure Kubelet is running with sufficient resources. If Kubelet is configured to run as a static pod, its resources might need adjustment.
      • Consider reducing the --max-pods setting for Kubelet if you have extremely high pod density on a node, or if your node is memory-constrained.
      • Ensure your node OS is not running out of memory before Kubernetes even gets a chance to manage it.
    • Why it works: By ensuring the Kubelet has enough memory, you prevent it from being a victim of the OOM killer, which could lead to node instability and pod evictions.
  5. System Daemons and Other Processes: Non-Kubernetes processes on the node (like Docker/containerd, systemd, monitoring agents, or even other applications not managed by Kubernetes) can consume memory.

    • Diagnosis: SSH into the affected node and use top, htop, or free -h to identify which processes are consuming the most memory.
    • Fix:
      • Optimize or reduce the resource footprint of these non-Kubernetes processes.
      • If possible, run these agents as Kubernetes pods with resource limits, so their usage is accounted for and managed by the scheduler.
      • Ensure your node image is lean and only includes necessary system packages.
    • Why it works: Reducing the memory footprint of non-Kubernetes processes frees up more memory for Kubernetes to manage pods effectively.
  6. Eviction Thresholds Too High/Low: Kubernetes has eviction thresholds that, when crossed, trigger pod evictions before the OOM killer is invoked. If these are misconfigured, they can lead to premature evictions.

    • Diagnosis: Check the Kubelet configuration on the node. Look for --eviction-hard and --eviction-soft parameters.
      # On the node, check Kubelet config file, e.g., /var/lib/kubelet/config.yaml
      # Or check systemd unit file for Kubelet flags
      
      Common settings are memory.available<100Mi or memory.available<10%.
    • Fix: Adjust the eviction thresholds. If you’re seeing evictions due to low memory, you might need to lower the threshold (e.g., memory.available<500Mi) to give the node more headroom before evicting. Alternatively, if your nodes are consistently hitting these thresholds, it indicates an underlying resource shortage as described in point 1.
      # Example in Kubelet config.yaml
      evictionHard:
        memory.available: "500Mi"
      evictionSoft:
        memory.available: "1Gi"
      evictionSoftGracePeriod:
        memory.available: "5m"
      
    • Why it works: By adjusting the thresholds, you change the point at which Kubernetes proactively evicts pods to reclaim memory, potentially preventing the more drastic OOM killer action.

After fixing these, you’ll likely encounter a CrashLoopBackOff error on pods that were restarted due to the OOM killer, as they might still have issues that prevent them from starting up cleanly.

Want structured learning?

Take the full Kubernetes course →