Your pods are stuck in Pending state because the Kubernetes scheduler can’t find a suitable node to place them on. This isn’t just a simple resource shortage; it’s a negotiation failure between the pod’s requirements and what every available node can offer.

Here’s a breakdown of why this happens and how to fix it:

1. Insufficient CPU or Memory on Nodes

This is the most common culprit. Pods declare their resource needs (requests), and if no node has enough available CPU or memory to meet those requests, the pod stays pending.

  • Diagnosis:

    • Check pod resource requests: kubectl get pod <pod-name> -o yaml (look for spec.containers[*].resources.requests).
    • Check node capacity and allocatable resources: kubectl describe nodes <node-name> (look for Capacity and Allocatable).
    • See how much is already used: kubectl top nodes.
  • Fix:

    • Option A (Scale Up Nodes): Add more nodes to your cluster or increase the CPU/memory of existing nodes. For example, if your nodes are t3.medium (2 vCPU, 4 GiB RAM) and your pod needs 1.5 vCPU and 2 GiB RAM, and you have 3 nodes already heavily utilized, you might need to add another t3.medium or upgrade to t3.large (2 vCPU, 8 GiB RAM).
    • Option B (Reduce Pod Requests): If the pod’s requests are too high, adjust them in its YAML definition:
      resources:
        requests:
          cpu: "500m" # Reduced from 1.5 vCPU (1500m)
          memory: "1Gi" # Reduced from 2Gi
      
      This allows the scheduler to consider nodes that might have been previously unsuitable.
  • Why it works: The scheduler uses Allocatable resources on nodes. Allocatable is Capacity minus resources reserved for the OS and kubelet. By either increasing Allocatable resources (adding/resizing nodes) or decreasing pod requests, you create a viable match.

2. Node Taints and Pod Tolerations Mismatch

Nodes can be "tainted" to repel certain pods, preventing them from being scheduled unless the pod explicitly "tolerates" that taint.

  • Diagnosis:

    • Check node taints: kubectl describe nodes <node-name> | grep Taints
    • Check pod tolerations: kubectl get pod <pod-name> -o yaml (look for spec.tolerations)
  • Fix:

    • Option A (Add Toleration to Pod): If the taint is intentional (e.g., a taint for control-plane nodes), add a matching toleration to your pod’s YAML:
      tolerations:
      - key: "node-role.kubernetes.io/control-plane"
        operator: "Exists"
        effect: "NoSchedule"
      
    • Option B (Remove Taint from Node): If the taint was added in error or is no longer needed, remove it: kubectl taint nodes <node-name> key=value:Effect- (e.g., kubectl taint nodes worker-node1 node-role.kubernetes.io/control-plane:NoSchedule-).
  • Why it works: Taints repel pods, while tolerations allow pods to ignore specific taints. Without a match, the scheduler sees the node as incompatible.

3. Node Selectors or Affinity/Anti-Affinity Rules

Pods can specify preferences or requirements for which nodes they can run on using nodeSelector, nodeAffinity, and podAffinity/podAntiAffinity. If no node matches these rules, the pod remains pending.

  • Diagnosis:

    • Check pod nodeSelector, affinity.nodeAffinity, affinity.podAffinity, affinity.podAntiAffinity: kubectl get pod <pod-name> -o yaml
    • Check node labels: kubectl get nodes --show-labels or kubectl describe node <node-name> (look for Labels).
  • Fix:

    • Option A (Label Nodes): Add labels to your nodes that match the pod’s nodeSelector or affinity rules. For example, if the pod requires disktype=ssd, label a node: kubectl label node <node-name> disktype=ssd.
    • Option B (Adjust Pod Rules): Modify the pod’s YAML to match existing node labels or relax the rules. For instance, change nodeSelector: {"environment": "production"} to nodeSelector: {"environment": "staging"} if you have staging nodes labeled.
    • Option C (Remove Rules): If the rules are too restrictive or no longer needed, remove them from the pod’s specification.
  • Why it works: The scheduler uses these rules as hard constraints. If no node satisfies the nodeSelector or nodeAffinity, or if podAntiAffinity rules prevent placement on available nodes, scheduling fails.

4. Persistent Volume Claims (PVCs) Not Bound

If your pod requires a PersistentVolumeClaim and that PVC is not yet bound to a PersistentVolume (PV), the pod will remain pending. This often happens if dynamic provisioning fails or if you’re using static PVs and the requested PVC doesn’t match any available PV.

  • Diagnosis:

    • Check PVC status: kubectl get pvc <pvc-name> (look for Bound status).
    • Check pod events for PVC-related errors: kubectl describe pod <pod-name> (look for events like FailedBinding or ProvisioningFailed).
    • Check PV status: kubectl get pv
  • Fix:

    • Option A (Ensure PV Availability):
      • Dynamic Provisioning: Verify your StorageClass is correctly configured and the underlying storage provider (e.g., AWS EBS, GCE PD, Ceph) is healthy. Check the logs of your provisioner pod (often in kube-system).
      • Static Provisioning: Ensure you have a PV defined that matches the PVC’s storageClassName, accessModes, and resources.requests.storage. If a match exists, kubectl patch pvc <pvc-name> -p '{"spec": {"volumeName": "<pv-name>"}}' to manually bind it.
    • Option B (Correct PVC Spec): Ensure the storageClassName and requested resources.requests.storage in the PVC definition are accurate.
  • Why it works: The scheduler waits for the storage to be ready. A pod needing persistent storage cannot run until its PVC is successfully bound to a PV.

5. PodDisruptionBudgets (PDBs) Preventing Scheduling

While less common for initial pending states, PDBs can indirectly affect scheduling if they prevent nodes from being available. More often, PDBs impact evictions, but a very strict PDB could theoretically block necessary node operations if misconfigured. A more direct impact is when a pod cannot be scheduled because it would violate a PDB on a target node (though this is rare and usually related to specific affinity rules).

  • Diagnosis:

    • Check PDBs: kubectl get pdb --all-namespaces
    • Examine pod events for PDB-related messages.
  • Fix:

    • Review and potentially relax minAvailable or maxUnavailable settings in your PDBs if they are too restrictive.
    • Ensure your application has enough replicas that PDBs don’t block essential operations.
  • Why it works: PDBs define the minimum number of pods that must remain available during voluntary disruptions. While primarily for evictions, complex interactions with affinity rules or node availability could indirectly influence scheduling decisions.

6. Scheduler Issues or Network Problems

Rarely, the Kubernetes scheduler itself might be unhealthy, or network connectivity issues between nodes and the API server could prevent the scheduler from seeing nodes or reporting its decisions.

  • Diagnosis:

    • Check scheduler pod status: kubectl get pods -n kube-system | grep kube-scheduler
    • Check scheduler logs: kubectl logs -n kube-system <kube-scheduler-pod-name>
    • Check network connectivity from nodes to the API server.
  • Fix:

    • Restart the kube-scheduler pod if it’s unhealthy: kubectl delete pod -n kube-system <kube-scheduler-pod-name> (it will be recreated by its Deployment/DaemonSet).
    • Troubleshoot network connectivity issues using standard tools (ping, traceroute, curl to the API server endpoint).
  • Why it works: The scheduler is a critical control plane component. If it’s down, unresponsive, or can’t communicate, no pods can be scheduled.

The next error you’ll likely see if you fix all scheduling issues is your pod starting but then failing due to an ImagePullBackOff or CrashLoopBackOff if the container image is invalid or the application inside panics immediately.

Want structured learning?

Take the full Kubernetes course →