The most surprising truth about service meshes is that they often make distributed systems more complex, not less, and you only want one when the complexity they solve is already a painful reality.

Imagine you have a bunch of microservices, say user-service, product-service, and order-service. They talk to each other over the network.

graph LR
    UserService[User Service] --> ProductService[Product Service]
    UserService --> OrderService[Order Service]
    ProductService --> OrderService

When user-service needs to call product-service, it makes a network request. If product-service is slow or unavailable, user-service might hang, or return an error. If you have dozens of services, managing all this inter-service communication, retries, timeouts, and security becomes a nightmare. This is where service meshes like Istio and Linkerd come in. They inject a proxy (called a "sidecar") next to each of your services.

graph LR
    UserService["User Service\n(with Sidecar Proxy)"] --> ProductService["Product Service\n(with Sidecar Proxy)"]
    UserService --> OrderService["Order Service\n(with Sidecar Proxy)"]
    ProductService --> OrderService

Now, your user-service doesn’t talk directly to product-service. It talks to its own sidecar proxy. That proxy then handles the network communication to the product-service sidecar, which finally delivers the request to the product-service application. This indirection is key. All the complex networking logic – retries, timeouts, circuit breaking, traffic shifting, mutual TLS encryption – is handled by these sidecars, not by your application code. This frees up your developers to focus on business logic.

When do you use one? You use a service mesh when you have a distributed system with a significant number of microservices, and you’re already experiencing or anticipating pain points in observability, security, and reliability of inter-service communication. If you have only 2-3 services and they are stable, a service mesh is likely overkill.

Istio is a mature, feature-rich service mesh. It excels in complex scenarios and offers a vast array of capabilities. It typically uses Envoy as its sidecar proxy, which is powerful but can be resource-intensive. Istio’s control plane is split into istiod, which handles configuration and certificate management.

Linkerd is known for its simplicity and performance. It’s designed to be lightweight and easy to operate, often with lower resource overhead than Istio. Linkerd uses its own custom-built proxy, also written in Rust for performance and efficiency. It focuses on core service mesh features like traffic management, observability, and security, without the sheer breadth of Istio’s configuration options.

Let’s say you have a frontend service that needs to call an api service.

Istio Example (Traffic Shifting):

You want to gradually roll out a new version (v2) of your api service. With Istio, you’d define a VirtualService and a DestinationRule.

destination-rule.yaml:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: api-destination
spec:
  host: api-service.default.svc.cluster.local
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2

virtual-service.yaml:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: api-virtual
spec:
  hosts:
  - api-service.default.svc.cluster.local
  http:
  - route:
    - destination:
        host: api-service.default.svc.cluster.local
        subset: v1
      weight: 90
    - destination:
        host: api-service.default.svc.cluster.local
        subset: v2
      weight: 10

By applying these, 90% of traffic to api-service will go to pods labeled version: v1, and 10% to pods labeled version: v2. This is managed by the Istio sidecars intercepting traffic.

Linkerd Example (Traffic Splitting):

Linkerd achieves similar traffic splitting with TrafficSplit resources.

linkerd-traffic-split.yaml:

apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
  name: api-traffic-split
spec:
  service: api-service # The Kubernetes Service that represents your API
  backends:
  - serviceMember: api-service-v1 # A Kubernetes Service pointing to v1 pods
    weight: 90
  - serviceMember: api-service-v2 # A Kubernetes Service pointing to v2 pods
    weight: 10

Here, api-service is a Kubernetes Service that acts as a logical grouping. api-service-v1 and api-service-v2 are separate Kubernetes Services, each pointing to pods with their respective versions (e.g., using selectors like app: api, version: v1). Linkerd’s sidecars consult the TrafficSplit configuration to route requests accordingly.

The core advantage of both is offloading cross-cutting concerns. Instead of each service needing libraries for retries, TLS, and metrics, these are handled by the sidecar proxy transparently. This means your application code becomes simpler, and updates to security or reliability policies can be applied network-wide without touching application code.

One aspect often overlooked is the control plane’s own availability. If your Istio or Linkerd control plane goes down, you can’t push new configurations, and in some failure modes, traffic might stop flowing. This is why ensuring the control plane itself is highly available and well-monitored is critical. For Istio, this means ensuring istiod is replicated. For Linkerd, it means ensuring its controller pods are redundant.

The next concept to grapple with is advanced traffic management, like canary deployments with automatic rollback based on metrics, or request-level routing based on HTTP headers.

Want structured learning?

Take the full DevOps & Platform Engineering course →