Helm is a package manager for Kubernetes, and like any package manager, sometimes things go wrong. When a helm upgrade or helm install fails, you’re left with a broken deployment. The good news is Helm keeps a history of your releases, allowing you to roll back to a known good state.
Let’s say you just ran helm upgrade my-app ./charts/my-app -f values.yaml and it bombed out. Your application is now in a weird, partially deployed state.
The first thing you need is the exact name of your release. You probably know this, but if you’ve forgotten or are working on someone else’s cluster, helm list -a will show you all releases, including those that are uninstalled or failed.
helm list -a
This will give you output like:
NAME NAMESPACE REVISION DEPLOYED STATUS CHART
my-app default 5 2023-10-27 10:00:00 failed my-app-1.2.0
another-app kube-system 1 2023-10-26 09:00:00 deployed another-app-0.5.1
Notice the REVISION column. This is key. Each time you install or upgrade a release, Helm increments this revision number. When an upgrade fails, the previous revision is usually still in a deployed or successful state.
To see the history of your specific release, use helm history:
helm history my-app
This will show you something like:
REVISION DEPLOYED STATUS CHART
1 2023-10-25 08:00:00 deployed my-app-1.1.0
2 2023-10-26 09:00:00 deployed my-app-1.1.1
3 2023-10-27 10:00:00 superseded my-app-1.2.0
4 2023-10-27 10:05:00 superseded my-app-1.2.0
5 2023-10-27 10:10:00 failed my-app-1.2.0
You want to roll back to the last known good revision. In this example, revision 2 or 3 (if 1.2.0 was the new version and 1.1.1 was the old one, and the upgrade from 1.1.1 to 1.2.0 failed) seems like a good candidate. You’ll typically want to roll back to the revision before the one that failed. If revision 5 failed, you want to roll back to revision 4.
To perform the rollback, you use the helm rollback command:
helm rollback my-app 4
Helm will now attempt to revert the resources managed by this release back to the state they were in during revision 4. This involves deleting the resources created or modified by revision 5 and recreating or reconfiguring them to match revision 4.
If the rollback itself fails, it’s usually because the resources Helm is trying to revert are in an unexpected state due to the initial failure. You might need to manually clean up some Kubernetes resources before retrying the helm rollback.
After a successful rollback, if you run helm list -a again, you’ll see a new revision number for my-app, and its status will be deployed.
helm list -a
Output might look like:
NAME NAMESPACE REVISION DEPLOYED STATUS CHART
my-app default 6 2023-10-27 11:00:00 deployed my-app-1.2.0
another-app kube-system 1 2023-10-26 09:00:00 deployed another-app-0.5.1
Note that Helm doesn’t actually delete the failed revision (5 in this case). It just marks the release as deployed with the new, rolled-back revision (6). The history is preserved, allowing you to go back even further if needed.
The most common reason a helm rollback fails is that the rollback operation itself triggers a new set of Kubernetes validation errors. This often happens when the underlying Kubernetes API server has changed its behavior or when a resource’s state is so corrupted that even reverting to a previous configuration doesn’t satisfy the API. In such cases, you might need to inspect the Kubernetes events for the failing pods or deployments associated with your release.
The next error you’ll hit is likely a helm uninstall failing because the release has been rolled back to a state where some resources are still in a pending or error state, and Helm cannot cleanly remove them.