Linkerd’s control plane metrics are not just a way to see what’s happening; they’re the key to understanding the distributed system’s health and performance, revealing bottlenecks and failures before they impact your users.

Let’s see Linkerd’s control plane components in action. Imagine we have a simple Kubernetes cluster with Linkerd installed. We can query Prometheus, which Linkerd typically integrates with, to get a real-time view of its internal workings.

# Get the current number of active gRPC streams for the control plane's API server
kubectl exec -n linkerd deploy/controller-api -c controller-api -- curl -s http://localhost:8080/metrics | grep "linkerd_controller_api_grpc_streams_total"

# Output might look like:
# linkerd_controller_api_grpc_streams_total{direction="inbound",grpc_method="ListNamespaces",grpc_service="io.l5d.controller.core.Controller"} 5
# linkerd_controller_api_grpc_streams_total{direction="outbound",grpc_method="Watch",grpc_service="io.l5d.controller.core.Controller"} 12

This output shows us, for example, that there are 5 inbound gRPC streams for the ListNamespaces method and 12 outbound streams for the Watch method on the Controller service. These metrics are not abstract; they represent active connections and requests between Linkerd’s internal components and the Kubernetes API server, or between different control plane pods.

Linkerd’s control plane is composed of several core components, each with specific responsibilities:

  • Controller (controller-api): This is the brain. It watches Kubernetes resources (like Pods, Services, Namespaces) and uses this information to configure the data plane proxies (the linkerd-proxy running alongside your application pods). It also exposes the control plane’s API for linkerd CLI commands and dashboards.
  • Destination (destination): This component is responsible for service discovery. It translates service names into actual endpoint addresses (IPs and ports) that the data plane proxies can use to route traffic. It also manages the health of these endpoints.
  • Identity (identity): This handles the TLS certificates for the service mesh. It issues and rotates certificates for both the control plane components and the data plane proxies, enabling secure communication within the mesh.
  • Proxy Injector (proxy-injector): This webhook automatically injects the linkerd-proxy container into your application pods when they are created or updated, making them part of the mesh.
  • Web (web): This component serves the Linkerd dashboard UI and the metrics endpoint that the linkerd-proxy and control plane components expose.

The metrics generated by these components provide a granular view of their operation. For instance, metrics from the Destination component can tell you how many endpoints it’s tracking for a given service, or the latency of its service discovery lookups. Metrics from Identity might reveal the rate of certificate issuance or any errors encountered during the process.

The key to understanding these metrics is to know what each component is trying to do and what its dependencies are. The Controller depends on the Kubernetes API server. The Destination component depends on the Controller for up-to-date endpoint information. The Identity component depends on Kubernetes for certificate-related resources.

Here’s a look at a typical metric you might see from the Destination component:

# Get the number of endpoints currently known by the Destination component
kubectl exec -n linkerd deploy/destination -c destination -- curl -s http://localhost:8080/metrics | grep "linkerd_destination_endpoints_total"

# Output might look like:
# linkerd_destination_urls_total{host="pods.cluster.local",service="kubernetes"} 150
# linkerd_destination_endpoints_total{service="httpbin",namespace="default"} 3

This tells us that for the httpbin service in the default namespace, the Destination component has discovered 3 healthy endpoints. If this number suddenly dropped to 0, it would indicate a problem with service discovery or the health checking of those endpoints, potentially leading to a complete loss of connectivity to that service.

The most surprising thing about Linkerd’s control plane metrics is how directly they reflect the state of the Kubernetes API server and the network fabric connecting the control plane pods. You’re not just seeing Linkerd’s internal state; you’re seeing its interpretation of the cluster’s state.

Consider the linkerd_controller_kubernetes_api_requests_total metric. This isn’t a metric of the Kubernetes API server itself, but rather a counter of requests that Linkerd’s Controller is making to the Kubernetes API server.

# Check the rate of Kubernetes API requests from the Controller
kubectl exec -n linkerd deploy/controller-api -c controller-api -- curl -s http://localhost:8080/metrics | grep "linkerd_controller_kubernetes_api_requests_total"

# Output might look like:
# linkerd_controller_kubernetes_api_requests_total{method="GET",path="/api/v1/namespaces",status_code="200"} 15000
# linkerd_controller_kubernetes_api_requests_total{method="LIST",path="/api/v1/pods",status_code="200"} 30000

If you see a massive surge in 2xx status codes for LIST requests to /api/v1/pods, it might mean the Controller is constantly re-listing pods. This could be a sign of instability in the cluster itself, or a misconfiguration in Linkerd causing it to churn unnecessarily. Conversely, a sudden increase in 5xx status codes for these requests points to problems within the Kubernetes API server, which Linkerd is then reporting.

The Proxy Injector component also exposes useful metrics, such as the rate of successful and failed pod injections.

# Get the total number of pod injection attempts and successes
kubectl exec -n linkerd deploy/proxy-injector -c proxy-injector -- curl -s http://localhost:8080/metrics | grep "linkerd_proxy_injector_pods_total"

# Output might look like:
# linkerd_proxy_injector_pods_total{result="success"} 500
# linkerd_proxy_injector_pods_total{result="error"} 5

A high rate of errors here, especially if it correlates with new application deployments failing to start, directly indicates an issue with the injection process. This could be due to RBAC permissions, network policies blocking communication to the webhook, or malformed pod definitions that the injector cannot process.

Ultimately, Linkerd’s control plane metrics are a diagnostic toolkit that maps directly to the health and connectivity of your service mesh. By understanding the role of each component and the metrics they expose, you gain an unprecedented ability to troubleshoot and optimize your distributed system.

The next logical step after monitoring the control plane metrics is to correlate them with the metrics exposed by the data plane proxies running alongside your application pods.

Want structured learning?

Take the full Linkerd course →