Zero-Downtime Deployments: Strategies Engineers Use

Kubernetes rolling updates are often misunderstood as the only way to achieve zero-downtime deployments, but they’re just one tool in a larger arsenal.

Let’s see what a typical rolling update looks like in practice. Imagine we have a simple Nginx deployment running 3 replicas.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.25.3
        ports:
        - containerPort: 80

When we update the image tag from nginx:1.25.3 to nginx:1.26.0 and apply this change, Kubernetes doesn’t just swap everything out at once. The strategy section, specifically rollingUpdate, dictates the process. With maxUnavailable: 1 and maxSurge: 1, Kubernetes will:

Terminate one pod: It picks one of the existing nginx-deployment pods and begins its termination process.
Create one new pod: Simultaneously, it starts creating a new pod using the updated nginx:1.26.0 image.
Wait for readiness: The new pod must pass its readiness probe (if configured) and become available to serve traffic.
Repeat: Once the new pod is ready, Kubernetes repeats the process: terminate another old pod, create another new pod, wait for readiness.

This continues until all old pods are replaced with new ones, and the desired replica count is met with the new version. The key here is that at no point are all pods unavailable. There’s always at least replicas - maxUnavailable pods running and ready to serve traffic.

This gradual replacement is what prevents downtime. If a new pod fails its readiness probe, Kubernetes pauses the rollout, preventing faulty versions from taking over.

The problem this solves is the traditional "big bang" deployment where an entire application is taken offline, updated, and then brought back online. This is inherently risky and leads to service interruption. Kubernetes rolling updates automate a much safer, phased approach.

Internally, the Kubernetes control plane (specifically the kube-controller-manager) monitors the Deployment object. When it detects a change in the spec.template.spec.containers[*].image or other pod-defining fields, it triggers the RollingUpdate strategy. It manages the creation and deletion of ReplicaSets behind the scenes. A new ReplicaSet is created for the new version, and the old ReplicaSet’s desired count is gradually decreased while the new one’s is increased, respecting maxUnavailable and maxSurge.

The levers you control are primarily within the strategy.rollingUpdate section:

maxUnavailable: The maximum number of pods that can be unavailable during the update. A value of 0 means no pods can be unavailable (requiring maxSurge to be at least 1). A percentage like 25% means up to 25% of the desired pods can be down.
maxSurge: The maximum number of pods that can be created over the desired number of replicas. A value of 0 means no new pods can be created until an old one is terminated. A percentage like 25% allows for extra capacity during the update.

Choosing these values is crucial. For a critical service, you might set maxUnavailable: 0 and maxSurge: 1 (or a small percentage) to ensure there’s always at least one instance running, and only one new instance is being provisioned at a time. For less critical services, you might allow maxUnavailable: 1 and maxSurge: 1 to speed up the deployment.

A common misconception is that maxUnavailable and maxSurge must be percentages. They can also be absolute numbers. For instance, if you have replicas: 10, maxUnavailable: 2 means up to 2 pods can be unavailable. maxSurge: 3 means up to 3 additional pods can be created beyond the current 10, for a temporary total of 13 pods during the transition.

While RollingUpdate is the default and most common, Kubernetes offers other deployment strategies. The Recreate strategy, for example, terminates all existing pods before creating new ones. This is simple but not zero-downtime.

Beyond RollingUpdate, you’ll encounter strategies like Blue/Green deployments and Canary releases. These often involve more than just the Deployment object, typically requiring additional components like Services and Ingress controllers to manage traffic routing. Blue/Green involves deploying a completely new version alongside the old and then switching traffic over. Canary involves gradually routing a small percentage of traffic to the new version to test it before a full rollout.

The one thing most people don’t realize about RollingUpdate is its dependency on readiness probes. If you don’t have a well-defined readinessProbe in your pod spec, Kubernetes doesn’t know when a new pod is truly ready to accept traffic. It will just wait for the pod to be Running and its containers to start. This can lead to traffic being sent to pods that haven’t fully initialized, causing errors and perceived downtime even if the deployment technically completes without violating maxUnavailable.

The next hurdle after mastering rolling updates is understanding how to manage traffic shifting for more advanced strategies like Canary deployments using tools like Istio, Linkerd, or native Kubernetes features with Ingress controllers.