The --atomic flag for Helm upgrades ensures that if any part of your deployment fails, Helm will automatically roll back the entire upgrade to the previous stable version.
Let’s see it in action. Imagine you have a deployment managed by Helm, and you’re trying to upgrade it to a new version.
helm upgrade my-release ./my-chart --version 1.2.0 --atomic
Here, my-release is the name of your Helm release, and ./my-chart is the path to your Helm chart. The --version 1.2.0 specifies the target version.
Now, let’s simulate a failure. Suppose your my-chart has a deployment manifest that accidentally includes an invalid image name, like invalid-image:latest. When you run the --atomic upgrade, Helm will attempt to create or update the Kubernetes resources defined in your chart.
If the Kubernetes API server rejects the creation of a resource (e.g., the invalid-image cannot be found or pulled by the kubelet), or if a critical resource like the Deployment doesn’t become ready within a specified timeout, Helm will detect this failure. Instead of leaving your release in a partially upgraded, broken state, it will execute a rollback.
This rollback isn’t just a simple "undo." Helm marks the current attempted upgrade as failed and then reverts all changes that were part of that specific upgrade operation. This means if you were updating a Deployment, a Service, and a ConfigMap, and only the Deployment failed, Helm would revert the Deployment to its previous state and undo any changes it made to the Service and ConfigMap during that same upgrade attempt. The end result is that your release is back to the state it was in before you ran the helm upgrade --atomic command.
The core problem --atomic solves is the "partially upgraded" state. Without it, a failed upgrade can leave your application in an inconsistent and unusable condition. You might have a new Deployment spec pointing to an invalid image, but an old Service still pointing to the old, working pods, or vice-versa. Diagnosing and manually correcting this mixed state can be a nightmare, often requiring deep dives into Kubernetes event logs and manual resource manipulation.
When you use --atomic, Helm leverages Kubernetes’ built-in resource management and its own release history. If a resource fails to be created or updated, or if a StatefulSet or Deployment doesn’t reach its desired state within the configured timeout (default is 5 minutes), Helm treats the entire upgrade as a failure. It then uses the helm rollback command internally to revert to the previous successful release revision.
Here’s what’s happening under the hood:
- Pre-upgrade checks: Helm might perform some basic validation of your chart’s manifest files.
- Resource application: Helm applies the new Kubernetes manifests to your cluster.
- Readiness checks: For certain resources like Deployments and StatefulSets, Helm waits for them to reach a stable, ready state. This is configured via the
timeoutparameter. - Failure detection: If any resource fails to apply, or if a critical resource doesn’t become ready within the timeout, Helm flags the upgrade as failed.
- Automatic rollback: Helm then triggers an internal rollback to the last known good revision of your release. This effectively undoes all changes made during the failed upgrade attempt.
Consider a scenario where you’re upgrading a chart that includes a Deployment and a HorizontalPodAutoscaler (HPA). If the HPA definition is malformed (e.g., targetCPUUtilizationPercentage is set to an invalid value like -10), Helm will detect this. The HPA might fail to be created or updated by the Kubernetes API. With --atomic, Helm will then roll back both the HPA and the Deployment (if it was also modified) to their previous states, ensuring your application remains stable.
The key benefit is that the rollback is transactional for the Helm release. It treats the entire set of changes in a single upgrade operation as an atomic unit: either all succeed, or all are reverted.
One subtle but crucial point is how Helm defines "failure" for the --atomic flag. It’s not just about the Kubernetes API returning an error during manifest application. It also includes the readiness probes and status of critical workloads. If a Deployment is created successfully, but its pods never become ready, or if they crash immediately, Helm will consider this a failure and initiate a rollback. This readiness check is vital because a syntactically correct manifest that results in a non-functional application is still a failed deployment.
The next thing you’ll likely run into is managing the specific failure that caused the rollback. Helm will tell you which revision it rolled back to, but you’ll need to examine the Kubernetes events and the state of the resources in the cluster for the failed revision to understand why the upgrade didn’t succeed.