Canary deployments allow you to roll out new machine learning models to a small subset of users before a full production release, minimizing the risk of widespread issues.

Imagine you have a model serving endpoint that handles 10,000 requests per minute. You’ve just trained model-v2 and want to deploy it. Instead of flipping a switch and sending all 10,000 requests to model-v2, a canary deployment would start by sending just 100 requests (1%) to model-v2, while the remaining 9,900 requests continue to be served by model-v1.

Here’s a simplified view of how this might look in a Kubernetes environment using a service mesh like Istio.

Scenario: Gradual Rollout of model-v2

Let’s assume you have two Kubernetes Deployments: model-v1 and model-v2, each with their respective Pods running your model inference servers. You also have a Kubernetes Service, model-service, that acts as the single entry point for your model.

# Deployment for the current stable model
apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-v1
spec:
  replicas: 5
  selector:
    matchLabels:
      app: model
      version: v1
  template:
    metadata:
      labels:
        app: model
        version: v1
    spec:
      containers:
      - name: model-server
        image: your-docker-repo/model-runner:v1
        ports:
        - containerPort: 8080
# Deployment for the new canary model
apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-v2
spec:
  replicas: 5
  selector:
    matchLabels:
      app: model
      version: v2
  template:
    metadata:
      labels:
        app: model
        version: v2
    spec:
      containers:
      - name: model-server
        image: your-docker-repo/model-runner:v2
        ports:
        - containerPort: 8080
# Kubernetes Service pointing to all model versions
apiVersion: v1
kind: Service
metadata:
  name: model-service
spec:
  selector:
    app: model # This selector will match both v1 and v2 deployments
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080

By default, Kubernetes Service distributes traffic round-robin to all healthy pods matching the selector. This is where a service mesh or an Ingress controller with advanced routing capabilities comes in.

Using Istio for Canary Deployments

Istio allows you to control traffic routing at a granular level using VirtualService and DestinationRule resources.

First, we need a DestinationRule to define subsets for our model versions.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: model-destination
spec:
  host: model-service # This should match the Kubernetes Service name
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2

This DestinationRule tells Istio that traffic destined for model-service can be routed to pods labeled version: v1 (as subset v1) or version: v2 (as subset v2).

Now, we create a VirtualService to define the traffic routing rules. Initially, we’ll send 100% of traffic to v1.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: model-virtual-service
spec:
  hosts:
  - model-service
  http:
  - route:
    - destination:
        host: model-service
        subset: v1
      weight: 100

At this point, all traffic hitting model-service (through Istio’s ingress gateway or sidecars) goes to pods with the version: v1 label.

Initiating the Canary

To start the canary, we modify the VirtualService to split traffic. We’ll send 99% to v1 and 1% to v2.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: model-virtual-service
spec:
  hosts:
  - model-service
  http:
  - route:
    - destination:
        host: model-service
        subset: v1
      weight: 99
    - destination:
        host: model-service
        subset: v2
      weight: 1

Now, 1% of requests are being served by model-v2. You would monitor key metrics for model-v2 (e.g., latency, error rates, prediction quality if you have an automated evaluation mechanism) and compare them against model-v1.

Gradually Increasing Traffic

If model-v2 performs as expected, you can gradually increase its traffic percentage.

Step 1: 10% to v2

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: model-virtual-service
spec:
  hosts:
  - model-service
  http:
  - route:
    - destination:
        host: model-service
        subset: v1
      weight: 90
    - destination:
        host: model-service
        subset: v2
      weight: 10

Step 2: 50% to v2

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: model-virtual-service
spec:
  hosts:
  - model-service
  http:
  - route:
    - destination:
        host: model-service
        subset: v1
      weight: 50
    - destination:
        host: model-service
        subset: v2
      weight: 50

Full Rollout

Once you’re confident, you can send 100% of traffic to model-v2 and then decommission model-v1.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: model-virtual-service
spec:
  hosts:
  - model-service
  http:
  - route:
    - destination:
        host: model-service
        subset: v2
      weight: 100

Header-Based Routing for Specific Users

A powerful aspect of canary deployments is routing specific users or traffic types to the new model. This is often done using HTTP headers. You can configure Istio’s VirtualService to route requests with a particular header (e.g., x-canary-version: v2) to the v2 subset, while all other traffic goes to v1.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: model-virtual-service
spec:
  hosts:
  - model-service
  http:
  - match:
    - headers:
        x-canary-version:
          exact: v2
    route:
    - destination:
        host: model-service
        subset: v2
      weight: 100
  - route: # Default route for all other traffic
    - destination:
        host: model-service
        subset: v1
      weight: 100

This allows you to enable the canary for internal testers or a specific beta group by simply adding the x-canary-version: v2 header to their requests.

The true power of canary deployments isn’t just about gradual percentage increases; it’s about having a robust monitoring and rollback strategy. If model-v2 shows any signs of degradation, you can immediately revert the traffic weight back to 100% v1 within seconds, minimizing the blast radius of any potential issues. This iterative approach to model deployment is fundamental to MLOps, ensuring stability and reliability in production environments.

The next step after mastering canary deployments is often implementing automated rollback triggers based on real-time performance metrics.

Want structured learning?

Take the full MLOps & AI DevOps course →