Fix Kubernetes Rolling Update Stuck Mid-Deployment (2026)

The Kubernetes API server is refusing to update your Deployment because it believes the new ReplicaSet it’s trying to create is unhealthy, even though it’s not.

This usually happens when the Pods in your new ReplicaSet are failing to become ready within the configured terminationGracePeriodSeconds of the old ReplicaSet, or when the new Pods themselves are crashing or failing readiness probes.

Here’s how to dig in:

Check Pod Status:
- Diagnosis: kubectl get pods -l app=<your-app-label>
- What to look for: Pods in CrashLoopBackOff, Error, or ImagePullBackOff states.
- Fix: If ImagePullBackOff, ensure your image name and tag are correct and that the cluster has access to the registry. If CrashLoopBackOff, check your application logs.
- Why it works: Pods must be running and passing their readiness probes to be considered healthy.
Examine Pod Logs:
- Diagnosis: kubectl logs <pod-name> -c <container-name> (if you have multiple containers in a pod)
- What to look for: Application errors, configuration issues, or missing dependencies that cause the application to exit.
- Fix: Correct the application code or configuration based on the error messages. Redeploy.
- Why it works: Logs provide the direct evidence of why an application is failing to start or stay running.
Review Readiness and Liveness Probes:
- Diagnosis: kubectl describe pod <pod-name> and look for the "Events" section. Also, check your Deployment YAML for readinessProbe and livenessProbe configurations.
- What to look for: Repeated probe failures, incorrect probe endpoints (httpGet.path, exec.command, tcpSocket.port), or overly aggressive initialDelaySeconds, periodSeconds, or timeoutSeconds.
- Fix: Adjust probe parameters (e.g., increase initialDelaySeconds if your app takes time to start) or correct the probe endpoint.
- Why it works: Readiness probes tell Kubernetes when a Pod is ready to serve traffic. Liveness probes tell Kubernetes when to restart a Pod. If these are misconfigured, Kubernetes might think a perfectly healthy Pod is unhealthy, or vice-versa.
Check Resource Limits and Requests:
- Diagnosis: kubectl describe pod <pod-name> and check the "Resource" section. Compare these to your cluster’s node capacity.
- What to look for: Pods requesting more CPU or memory than available on the node, or hitting their limits.
- Fix: Increase resource requests and limits in your Deployment spec, or ensure your nodes have sufficient capacity.
- Why it works: If a Pod exceeds its CPU or memory limits, it can be terminated by the kubelet. If it requests more resources than are available on a node, it won’t be scheduled.
Inspect the ReplicaSet:
- Diagnosis: kubectl get replicaset -l app=<your-app-label> and kubectl describe replicaset <new-replicaset-name>
- What to look for: The desired number of Pods in the new ReplicaSet, and the number of available Pods. Also, check the "Events" on the ReplicaSet description for any scheduling failures.
- Fix: This is usually a symptom of the above issues; fixing the Pods will allow the ReplicaSet to scale up.
- Why it works: The ReplicaSet is responsible for ensuring the desired number of Pods are running. If Pods fail to become ready, the ReplicaSet cannot reach its target.
Examine Node Conditions:
- Diagnosis: kubectl get nodes and kubectl describe node <node-name>
- What to look for: Nodes in a NotReady state, or nodes with high resource utilization (CPU, memory, disk).
- Fix: Troubleshoot the unhealthy node (e.g., restart kubelet, check network connectivity, free up disk space).
- Why it works: Kubernetes cannot schedule Pods onto unhealthy nodes.
Verify Service Selectors and Endpoints:
- Diagnosis: kubectl get svc <your-service-name> and kubectl describe svc <your-service-name>
- What to look for: Ensure the Service’s selector correctly matches the labels of your Pods. Check the Endpoints list in the service description to see if any Pod IPs are listed.
- Fix: Correct the Service’s selector to match your Pod labels.
- Why it works: The Service relies on its selector to find healthy Pods and route traffic to them. If the selector is wrong, the Service won’t see your new Pods, and Kubernetes might interpret this as a deployment failure.
Check Admission Controllers:
- Diagnosis: Review your Kubernetes API server configuration and any custom admission controllers.
- What to look for: Admission controllers that might be rejecting Pod creation for reasons not immediately obvious from Pod logs (e.g., security policies, resource quotas).
- Fix: Adjust the configuration of the offending admission controller.
- Why it works: Admission controllers intercept requests to the Kubernetes API server and can mutate or validate them before they are persisted.

After resolving the underlying Pod issues, you’ll likely see a NewReplicaSet successfully created and the old one scaled down. The next error you might hit is a CrashLoopBackOff on a new Pod if you only fixed the readiness probe and not the root cause of the application crash.