Linkerd’s circuit breaking is designed to prevent a single failing service from bringing down the entire application by intelligently isolating it.

Let’s see it in action. Imagine we have a frontend service that calls an api service. If the api service becomes overloaded or starts returning errors, the frontend will start failing too, potentially causing a cascade.

Here’s a simplified frontend deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
spec:
  replicas: 3
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: frontend
        image: my-fancy-frontend:v1.0
        ports:
        - containerPort: 8080
        env:
        - name: API_ADDR
          value: "http://api.default.svc.cluster.local"

And the api service it depends on:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
      - name: api
        image: my-robust-api:v2.1
        ports:
        - containerPort: 5000

Initially, everything works fine. The frontend service can reach the api service without issues.

Now, let’s simulate a failure in the api service. We can do this by scaling down the api deployment to zero replicas, or by introducing a deliberate error in the api application itself. For demonstration, let’s assume the api service starts returning 5xx errors consistently.

Without circuit breaking, the frontend would continue to hammer the failing api service. Each request from frontend would time out or receive an error, consuming resources on the frontend pods. This can lead to thread exhaustion, increased latency, and eventually, the frontend service itself becoming unresponsive. The problem then spreads.

Linkerd’s circuit breaking intervenes. When Linkerd, running as a sidecar proxy alongside your application pods, observes a consistent pattern of failures (like 5xx responses) from a destination service (api in this case), it can "open the circuit."

Here’s how you configure it. You create a CircuitBreaker policy. This policy defines the conditions under which the circuit should open and how long it should stay open.

apiVersion: policy.linkerd.io/v1alpha1
kind: CircuitBreaker
metadata:
  name: api-circuit-breaker
  namespace: default # Or the namespace where your frontend is deployed
spec:
  target:
    selector:
      matchLabels:
        app: frontend # Apply this to the frontend's outgoing traffic
  policy:
    # This is the key: specify the destination and the failure detection
    destination:
      host: api.default.svc.cluster.local
    # Define conditions for opening the circuit
    failureDetection:
      # Open the circuit if 90% of requests fail in the last 10 seconds
      consecutive5xxErrors: 5
      # You can also set a minimum number of requests to avoid opening on transient blips
      minRequests: 10
    # Define how long the circuit stays open
    maxRetries: 0 # Effectively disable retries when the circuit is open
    # How long to wait before attempting to close the circuit again
    resetTimeout: 60s # After 60 seconds, try a single request to see if the backend is back

With this CircuitBreaker policy in place, when the api service starts returning 5xx errors, Linkerd on the frontend sidecar will detect this pattern. After a few consecutive 5xx errors (as defined by consecutive5xxErrors: 5 and minRequests: 10), Linkerd will open the circuit to api.default.svc.cluster.local.

When the circuit is open, any new requests from frontend to api will be immediately failed by the frontend’s Linkerd proxy, without ever being sent to the api service. This is crucial: the frontend’s resources are no longer wasted on requests that are guaranteed to fail. The frontend pods remain healthy and responsive, serving traffic that doesn’t depend on the api.

After the resetTimeout of 60 seconds, Linkerd will allow a single request to api to pass through. If this request succeeds, the circuit will be closed, and normal traffic flow will resume. If it fails, the circuit will remain open, and the timeout timer restarts. This prevents the frontend from being overwhelmed when the api service is struggling and allows the api service to recover at its own pace.

The most surprising thing is that Linkerd’s circuit breaking doesn’t require any code changes in your application. The logic is entirely handled by the sidecar proxy, making it a transparent and powerful way to enhance application resilience. This means your application code doesn’t need explicit retry logic or failure detection for downstream services; the proxy handles it.

The CircuitBreaker policy is applied to the client (the frontend in this case) and dictates how it interacts with a specific destination (api). This is different from rate limiting, which typically applies to the server and limits the number of requests it accepts.

The next concept you’ll likely explore is how Linkerd’s RetryBudget can work in conjunction with circuit breaking to provide even more granular control over retries and failure tolerance.

Want structured learning?

Take the full Linkerd course →