Linkerd and Flagger are a killer combo for rolling out new versions of your applications without causing downtime. Flagger is the brains, telling Linkerd what to do, and Linkerd is the muscle, managing the traffic.
Let’s see this in action. Imagine you have a simple web service deployed with Linkerd.
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
namespace: demo
spec:
replicas: 3
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: app
image: ghcr.io/fluxcd/flagger-example/frontend:v0.1.0
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: frontend
namespace: demo
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
selector:
app: frontend
---
apiVersion: "networking.istio.io/v1alpha3"
kind: VirtualService
metadata:
name: frontend
namespace: demo
spec:
hosts:
- frontend.demo.svc.cluster.local
routes:
- timeout: 10s
rewrite:
uri: /
destination:
host: frontend
subset: v1
headers:
request:
- set:
X-Canary: "true"
faults:
- abort:
percentage: 0
errorType: abort
- delay:
percentage: 0
fixedDelay: "0ms"
---
apiVersion: "linkerd.io/v1alpha1"
kind: TrafficSplit
metadata:
name: frontend
namespace: demo
spec:
service: frontend
backends:
- service: frontend-v1
weight: 100
- service: frontend-v2
weight: 0
Here, frontend is your primary service. We’re using a TrafficSplit (Linkerd’s equivalent to Istio’s VirtualService for traffic routing) to direct 100% of traffic to frontend-v1.
Now, let’s introduce Flagger. You define a Canary resource:
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: frontend
namespace: demo
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: frontend
autoscalingRef:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
name: frontend
service:
port: 80
gateways:
- istio-ingressgateway # This would be your Linkerd ingress gateway if you were using Istio, but for Linkerd, it's often omitted or handled by Linkerd's own service.
analysis:
schedule: 1m
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
threshold: 99.9
interval: 30s
- name: request-latency
threshold: 500 # ms
interval: 1m
webhooks:
- name: "promotion"
type: "post"
url: "http://flagger-operator.fluxcd.latest/promote"
- name: "denial"
type: "post"
url: "http://flagger-operator.fluxcd.latest/denial"
Flagger’s job is to manage this Canary resource. When you update the frontend Deployment to use a new image (e.g., v0.2.0), Flagger kicks in.
It creates a new frontend-primary Deployment (which is just a copy of your original frontend Deployment with the new image) and a new frontend-primary Service. Crucially, it also creates a new frontend-primary TrafficSplit.
Initially, this new TrafficSplit sends 100% of the traffic to the frontend-primary Service (which points to your old v0.1.0 pods).
Then, Flagger starts the canary process. It gradually shifts traffic from the primary service to the new primary service using the TrafficSplit. In our example, it moves traffic in 10% increments (stepWeight: 10) up to a maximum of 50% (maxWeight: 50).
During each step, Flagger monitors your application’s health using metrics from Linkerd. It checks request-success-rate and request-latency. If these metrics stay within the defined thresholds (99.9% success rate and less than 500ms latency), Flagger proceeds to the next step. If they degrade, Flagger will halt the rollout, potentially roll back, and alert you.
The webhooks are for custom actions. Here, promotion and denial point to Flagger’s own operator, which can trigger other workflows.
This gradual rollout is the core of automated canary deployments. You get real-world user traffic hitting your new version while still having the old version ready to take over if anything goes wrong. Flagger uses Linkerd’s traffic management capabilities to achieve this without manual intervention.
The real power here is that Linkerd provides the fine-grained traffic control and observability that Flagger needs. Flagger doesn’t need to know how to split traffic; it just tells Linkerd what to split and when. Linkerd, via its TrafficSplit resources, does the actual routing.
One crucial detail often missed is how Linkerd’s TrafficSplit interacts with Kubernetes Services. When Flagger creates a new TrafficSplit pointing to a new backend service (e.g., frontend-v2), Linkerd intercepts traffic destined for the original frontend Service. It then intelligently routes that traffic according to the weights defined in the TrafficSplit. If a backend service isn’t selected by any TrafficSplit, it receives no traffic.
After the analysis period passes successfully, Flagger promotes the new primary by updating the original frontend Deployment to use the new image and then updates the frontend TrafficSplit to send 100% traffic to the new version. The old frontend-primary Deployment and Service are then scaled down and eventually removed.
The next concept you’ll likely encounter is implementing more sophisticated promotion and rollback strategies, possibly involving external systems triggered via Flagger’s webhooks.