Linkerd’s production install isn’t just about getting it running; it’s about making it resilient and secure, and the most surprising thing is how much of that resilience comes from understanding its absence of certain features.

Let’s see Linkerd in action, not as a static diagram, but as a dynamic service mesh. Imagine you have a couple of microservices, webapp and users-api, both running in Kubernetes.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  replicas: 2
  selector:
    matchLabels:
      app: webapp
  template:
    metadata:
      labels:
        app: webapp
    spec:
      containers:
      - name: webapp
        image: nginxdemos/hello:plain-1.0
        ports:
        - containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: users-api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: users-api
  template:
    metadata:
      labels:
        app: users-api
    spec:
      containers:
      - name: users-api
        image: nginxdemos/hello:plain-1.0
        ports:
        - containerPort: 80

Now, if webapp wants to call users-api, it just makes a standard HTTP request to http://users-api.default.svc.cluster.local. Without Linkerd, this is a raw, unencrypted, unobserved network hop.

After installing Linkerd (we’ll assume a default install for now using linkerd install | kubectl apply -f -), you’d typically inject it into your workloads.

linkerd inject --enable-debug-containers webapp.yaml | kubectl apply -f -
linkerd inject --enable-debug-containers users-api.yaml | kubectl apply -f -

What happens now? Linkerd adds a linkerd-proxy container to each pod. The webapp and users-api containers don’t change their code. They still think they’re talking to users-api.default.svc.cluster.local. But that DNS name now resolves to the linkerd-proxy sidecar for webapp. This proxy then establishes a secure, mTLS connection to the linkerd-proxy sidecar for users-api, and only then forwards the request to the actual users-api container. All this happens transparently.

The core problem Linkerd solves is the inherent unreliability and lack of visibility in distributed systems. Network failures are common, service instances come and go, and understanding what’s happening between services is hard. Linkerd provides:

  • mTLS Encryption: By default, all traffic between meshed pods is automatically encrypted using mutual TLS. This isn’t something you configure per-service; it’s a property of the mesh.
  • Retries and Timeouts: Linkerd’s proxies automatically apply configurable retries and timeouts to requests. If webapp requests users-api and the response is slow, the linkerd-proxy can automatically retry.
  • Load Balancing: Linkerd provides intelligent, per-request load balancing across healthy instances of a service.
  • Observability: Metrics (success rates, latency distribution), tracing, and golden metrics are automatically generated for all meshed traffic.

To harden your production install, you’ll want to customize Linkerd’s configuration. The primary tool for this is the linkerd-config ConfigMap in the linkerd namespace. You can fetch the default configuration and modify it:

kubectl get configmap linkerd-config -n linkerd -o yaml > linkerd-config.yaml

Inside linkerd-config.yaml, you’ll find a linkerd.io/proxy-config section. This is where you tune the sidecar proxy’s behavior for all injected workloads.

For example, to set a default request timeout for all outgoing requests from any meshed pod to 5 seconds, you’d modify the proxy.proxy.outgoing.request.timeout field:

# ... other config ...
    linkerd.io/proxy-config: |
      proxy:
        listenAddr: ":4140"
        proxy:
          outgoing:
            request:
              timeout: "5s" # Default timeout for outgoing requests
# ... other config ...

Applying this modified ConfigMap (kubectl apply -f linkerd-config.yaml -n linkerd) will cause existing proxies to restart and pick up the new configuration, and new proxies will be initialized with it.

To set specific retry policies for a particular service, you don’t modify the global config. Instead, you use annotations on the service’s Deployment or Pod. For instance, to configure users-api to retry GET requests up to 3 times, and POST requests up to 1 time, you’d add:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: users-api
  annotations:
    linkerd.io/proxy-retry-budget: |
      {"v1": {"retryOn": "5xx", "numRetries": 3, "perRetryPolicy": [{"condition": "POST", "numRetries": 1}]}}
spec:
  # ... rest of deployment spec

This annotation tells the linkerd-proxy sidecar attached to users-api pods how to handle retries for requests originating from users-api. The retryOn: "5xx" means it will retry on server errors. The perRetryPolicy allows finer-grained control, here specifying that POST requests should have fewer retries.

A key hardening step often overlooked is disabling outbound traffic from the linkerd-proxy to arbitrary external services unless explicitly allowed. This is achieved by controlling the outbound listener. By default, Linkerd proxies allow outgoing connections to any Kubernetes Service and any IP address. To restrict this, you can modify the proxy.proxy.outbound.allow setting. A common hardening pattern is to only allow traffic to explicitly defined Kubernetes Service destinations.

# ... in linkerd-config.yaml ...
      proxy:
        listenAddr: ":4140"
        proxy:
          outgoing:
            request:
              timeout: "5s"
            allow: # This is the crucial hardening part
              - "kubernetes" # Allows traffic to Kubernetes Services
              # - "k8s:some-specific-namespace/some-specific-service" # Example: Allow only specific services
              # - "ip:192.168.1.0/24" # Example: Allow specific IP ranges
# ...

Setting allow: ["kubernetes"] restricts the proxy to only forwarding traffic to destinations that resolve via Kubernetes DNS (i.e., other Kubernetes Services). If you need to reach external services, you’d explicitly list them or their IP ranges.

The most surprising thing about Linkerd’s resilience is that it largely comes from its default mTLS, which means you don’t have to do anything to encrypt inter-service communication. You only need to focus on configuring policies, retries, and timeouts on top of that secure foundation.

Once you’ve got retries and timeouts dialed in, you’ll likely start thinking about authorization policies to control which services can talk to each other, which is managed via AuthorizationPolicy resources.

Want structured learning?

Take the full Linkerd course →