Linkerd traffic policies are how you tell the service mesh how to handle traffic to a specific destination, primarily by defining routing rules and retry behavior.

Let’s see it in action. Imagine you have two versions of a user-service: v1 and v2. You want to gradually shift traffic from v1 to v2 and configure retries for requests to v2 if they fail.

First, you need a Service and Deployments for both versions.

apiVersion: v1
kind: Service
metadata:
  name: user-service
  labels:
    app: user-service
spec:
  selector:
    app: user-service # This selector should match pods for *both* v1 and v2
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service-v1
spec:
  replicas: 2
  selector:
    matchLabels:
      app: user-service
      version: v1
  template:
    metadata:
      labels:
        app: user-service
        version: v1
    spec:
      containers:
      - name: user-service
        image: your-docker-repo/user-service:v1
        ports:
        - containerPort: 8080
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service-v2
spec:
  replicas: 2
  selector:
    matchLabels:
      app: user-service
      version: v2
  template:
    metadata:
      labels:
        app: user-service
        version: v2
    spec:
      containers:
      - name: user-service
        image: your-docker-repo/user-service:v2
        ports:
        - containerPort: 8080

Now, let’s create a TrafficPolicy to manage traffic to the user-service.

apiVersion: trafficpolicy.linkerd.io/v1alpha1
kind: TrafficPolicy
metadata:
  name: user-service-policy
  namespace: default
spec:
  target:
    selector:
      matchLabels:
        app: user-service
  routes:
  - condition:
      lti:
        header:
          name: "user-agent"
          value: ".*Chrome.*"
    addr:
    - ip: 10.1.2.3 # IP of user-service-v1 pods (example)
      port: 8080
  - condition:
      lti:
        header:
          name: "user-agent"
          value: ".*Firefox.*"
    addr:
    - ip: 10.1.2.4 # IP of user-service-v2 pods (example)
      port: 8080
  - condition: {} # Default route for all other traffic
    addr:
    - ip: 10.1.2.4 # Default to v2
      port: 8080
  backend:
    retry:
      policy:
        maxRetries: 3
        timeout: 500ms
        status:
          codes: [500, 502, 503, 504]

This TrafficPolicy does a few things:

  • Targeting: spec.target.selector specifies that this policy applies to all pods with the label app: user-service.
  • Routing:
    • If the user-agent header contains "Chrome", traffic goes to 10.1.2.3 (which you’d map to your user-service-v1 pods).
    • If the user-agent header contains "Firefox", traffic goes to 10.1.2.4 (your user-service-v2 pods).
    • For any other user-agent, it falls back to 10.1.2.4 (also user-service-v2).
    • Note: In a real scenario, you wouldn’t hardcode IPs. You’d use Kubernetes Service resources, and Linkerd would resolve them. The addr field here is illustrative of directing traffic.
  • Retries: The backend.retry section configures retry logic for requests to the destination defined in the backend (which in this case, is the default route pointing to v2). It will retry up to 3 times, with a 500ms timeout between retries, if the upstream service returns a 500, 502, 503, or 504 status code.

The most surprising true thing about Linkerd traffic policies is that they don’t replace Kubernetes Services; they augment them by providing fine-grained control after the Kubernetes Service has done its initial load balancing or IP resolution.

When a request for user-service arrives at the Linkerd proxy sidecar on the client pod, the proxy consults the TrafficPolicy. It inspects the request’s headers (like user-agent in our example) against the routes defined. Once a matching route is found, the proxy then directs the request to the specified IP address and port. If that destination (the user-service-v2 pods in our default route example) is unhealthy or returns certain error codes, the retry policy kicks in, making the client experience more resilient without the application code needing to know about retries.

You can also implement traffic splitting directly within a TrafficPolicy by using multiple addr entries with different weights, though TrafficSplit resources are often preferred for pure traffic splitting scenarios. The TrafficPolicy is more about how to route and how to retry once a destination is selected, rather than just the percentage split itself.

The condition field uses Linkerd’s "Linkerd Traffic Inspection" (LTI) DSL, which allows for matching based on request headers, query parameters, and HTTP methods. The addr field specifies the actual destination. For more complex scenarios, you can define multiple addr entries within a single route, and Linkerd will distribute traffic among them based on their implicit or explicit weights. The empty condition: {} acts as a catch-all, ensuring all traffic not matching previous routes is handled.

The retry configuration is powerful. The maxRetries limits the number of attempts, timeout sets a jittered delay between retries to avoid overwhelming a struggling service, and status.codes specifies which HTTP status codes should trigger a retry. This is crucial for building robust distributed systems where transient failures are common.

The next concept you’ll likely explore is how to use TrafficPolicy in conjunction with ServiceProfile for more advanced routing based on request paths and methods, and how to implement canary deployments.

Want structured learning?

Take the full Linkerd course →