Linkerd traffic policies are how you tell the service mesh how to handle traffic to a specific destination, primarily by defining routing rules and retry behavior.
Let’s see it in action. Imagine you have two versions of a user-service: v1 and v2. You want to gradually shift traffic from v1 to v2 and configure retries for requests to v2 if they fail.
First, you need a Service and Deployments for both versions.
apiVersion: v1
kind: Service
metadata:
name: user-service
labels:
app: user-service
spec:
selector:
app: user-service # This selector should match pods for *both* v1 and v2
ports:
- protocol: TCP
port: 8080
targetPort: 8080
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service-v1
spec:
replicas: 2
selector:
matchLabels:
app: user-service
version: v1
template:
metadata:
labels:
app: user-service
version: v1
spec:
containers:
- name: user-service
image: your-docker-repo/user-service:v1
ports:
- containerPort: 8080
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service-v2
spec:
replicas: 2
selector:
matchLabels:
app: user-service
version: v2
template:
metadata:
labels:
app: user-service
version: v2
spec:
containers:
- name: user-service
image: your-docker-repo/user-service:v2
ports:
- containerPort: 8080
Now, let’s create a TrafficPolicy to manage traffic to the user-service.
apiVersion: trafficpolicy.linkerd.io/v1alpha1
kind: TrafficPolicy
metadata:
name: user-service-policy
namespace: default
spec:
target:
selector:
matchLabels:
app: user-service
routes:
- condition:
lti:
header:
name: "user-agent"
value: ".*Chrome.*"
addr:
- ip: 10.1.2.3 # IP of user-service-v1 pods (example)
port: 8080
- condition:
lti:
header:
name: "user-agent"
value: ".*Firefox.*"
addr:
- ip: 10.1.2.4 # IP of user-service-v2 pods (example)
port: 8080
- condition: {} # Default route for all other traffic
addr:
- ip: 10.1.2.4 # Default to v2
port: 8080
backend:
retry:
policy:
maxRetries: 3
timeout: 500ms
status:
codes: [500, 502, 503, 504]
This TrafficPolicy does a few things:
- Targeting:
spec.target.selectorspecifies that this policy applies to all pods with the labelapp: user-service. - Routing:
- If the
user-agentheader contains "Chrome", traffic goes to10.1.2.3(which you’d map to youruser-service-v1pods). - If the
user-agentheader contains "Firefox", traffic goes to10.1.2.4(youruser-service-v2pods). - For any other
user-agent, it falls back to10.1.2.4(alsouser-service-v2). - Note: In a real scenario, you wouldn’t hardcode IPs. You’d use Kubernetes
Serviceresources, and Linkerd would resolve them. Theaddrfield here is illustrative of directing traffic.
- If the
- Retries: The
backend.retrysection configures retry logic for requests to the destination defined in thebackend(which in this case, is the default route pointing tov2). It will retry up to 3 times, with a 500ms timeout between retries, if the upstream service returns a 500, 502, 503, or 504 status code.
The most surprising true thing about Linkerd traffic policies is that they don’t replace Kubernetes Services; they augment them by providing fine-grained control after the Kubernetes Service has done its initial load balancing or IP resolution.
When a request for user-service arrives at the Linkerd proxy sidecar on the client pod, the proxy consults the TrafficPolicy. It inspects the request’s headers (like user-agent in our example) against the routes defined. Once a matching route is found, the proxy then directs the request to the specified IP address and port. If that destination (the user-service-v2 pods in our default route example) is unhealthy or returns certain error codes, the retry policy kicks in, making the client experience more resilient without the application code needing to know about retries.
You can also implement traffic splitting directly within a TrafficPolicy by using multiple addr entries with different weights, though TrafficSplit resources are often preferred for pure traffic splitting scenarios. The TrafficPolicy is more about how to route and how to retry once a destination is selected, rather than just the percentage split itself.
The condition field uses Linkerd’s "Linkerd Traffic Inspection" (LTI) DSL, which allows for matching based on request headers, query parameters, and HTTP methods. The addr field specifies the actual destination. For more complex scenarios, you can define multiple addr entries within a single route, and Linkerd will distribute traffic among them based on their implicit or explicit weights. The empty condition: {} acts as a catch-all, ensuring all traffic not matching previous routes is handled.
The retry configuration is powerful. The maxRetries limits the number of attempts, timeout sets a jittered delay between retries to avoid overwhelming a struggling service, and status.codes specifies which HTTP status codes should trigger a retry. This is crucial for building robust distributed systems where transient failures are common.
The next concept you’ll likely explore is how to use TrafficPolicy in conjunction with ServiceProfile for more advanced routing based on request paths and methods, and how to implement canary deployments.