The most surprising thing about Kubernetes deployments is that they’re designed to be unrecoverable by default.
Let’s see this in action. Imagine we have a simple Nginx deployment running.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.21.0 # Using a specific version
ports:
- containerPort: 80
We apply this: kubectl apply -f deployment.yaml. Kubernetes creates a ReplicaSet to ensure 3 pods with the app: nginx label are running the nginx:1.21.0 image.
Now, let’s say we want to update the image to nginx:1.22.0. We edit deployment.yaml and change the image tag. kubectl apply -f deployment.yaml again.
Kubernetes, by default, performs a "Rolling Update." It gradually replaces old pods with new ones. It creates a new ReplicaSet for nginx:1.22.0 and starts bringing up pods for it, while terminating pods from the nginx:1.21.0 ReplicaSet.
Here’s the key: the old ReplicaSet (for nginx:1.21.0) isn’t deleted immediately. It’s kept around for a while. This is the foundation of rollback.
If something goes wrong with the nginx:1.22.0 deployment (e.g., pods crash, your application returns errors), you can tell Kubernetes to revert to the previous version. The command for this is kubectl rollout undo deployment/nginx-deployment.
What happens under the hood? Kubernetes doesn’t actually revert the pods. Instead, it looks at the deployment’s history. It finds the previous ReplicaSet (the one managing nginx:1.21.0 pods) and scales it back up to its desired replica count (3 in our example), while scaling down the new, problematic ReplicaSet (for nginx:1.22.0). The old ReplicaSet becomes the active one again.
The mental model to build here is that a Kubernetes Deployment is not just a configuration for pods. It’s a controller that manages ReplicaSets over time. Each time you kubectl apply a change that modifies the pod template (like the image tag), the Deployment controller creates a new ReplicaSet and gradually transitions traffic. The history of these ReplicaSets is what enables rollbacks.
The spec.revisionHistoryLimit on the Deployment controls how many old ReplicaSets are kept. By default, it’s 10. If you set it to 0, you lose the ability to rollback. If you set it very high, you might consume more etcd storage than necessary.
When you perform a rollback, Kubernetes doesn’t just switch back to the last ReplicaSet. It iterates through the history of ReplicaSets associated with that Deployment. You can see this history with kubectl rollout history deployment/nginx-deployment. Each entry in the history corresponds to a specific ReplicaSet. The undo command essentially tells the Deployment controller to activate the ReplicaSet associated with the previous revision.
If you want to roll back to a specific past version, not just the immediately previous one, you can use kubectl rollout undo deployment/nginx-deployment --to-revision=2 (where 2 is the revision number from kubectl rollout history). This directly tells Kubernetes to make the ReplicaSet for revision 2 the active one again.
A common pitfall is misunderstanding what triggers a new revision. Changing only the number of replicas (spec.replicas) does not create a new revision and therefore cannot be rolled back in the same way. Only changes to the pod template (spec.template) result in new ReplicaSets and are part of the rollback history.
The next thing you’ll need to master is managing the speed of your rollouts and rollbacks.