Istiod is the control plane for Istio, and running it in high availability (HA) mode means ensuring it can continue to serve traffic even if one or more of its instances fail. This is crucial for maintaining the stability and resilience of your service mesh.
Let’s see Istiod in action, managing a simple service mesh. Imagine two microservices, frontend and backend, communicating with each other.
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: frontend-vs
spec:
hosts:
- frontend
http:
- route:
- destination:
host: backend
subset: v1
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: backend-dr
spec:
host: backend
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
When you apply this configuration, Istiod processes it. It then pushes the necessary Envoy proxy configurations to the sidecar proxies running alongside your frontend and backend pods. These proxies, acting on Istiod’s instructions, intercept and route traffic according to the VirtualService and DestinationRule. If you had multiple instances of Istiod running in HA mode, any one of them could have served this configuration request.
The core problem Istio solves is complex, distributed system management. It provides a uniform way to manage network traffic, observe telemetry, and secure communication between services, regardless of the underlying infrastructure. Istiod, as the central brain, is responsible for:
- Service Discovery: Keeping track of all services and their endpoints within the mesh.
- Configuration Distribution: Pushing routing rules, security policies, and telemetry configurations to the data plane (Envoy proxies).
- Certificate Management: Issuing and rotating certificates for mTLS communication.
To run Istiod in HA mode, you typically deploy multiple replicas of the istiod deployment. Kubernetes handles the scheduling and health checking of these replicas. For example, in a typical Istio installation using istioctl, you might configure HA during installation:
istioctl install --set profile=default \
--set components.pilot.k8s.env.ISTIOD_CPU_REQUEST=200m \
--set components.pilot.k8s.env.ISTIOD_CPU_LIMIT=200m \
--set components.pilot.k8s.env.ISTIOD_MEMORY_REQUEST=256Mi \
--set components.pilot.k8s.env.ISTIOD_MEMORY_LIMIT=256Mi \
--set components.pilot.replicaCount=3
Here, --set components.pilot.replicaCount=3 explicitly tells Istio to deploy three replicas of the istiod component. Kubernetes will then ensure that these three pods are running and healthy. If one pod crashes or becomes unresponsive, Kubernetes will restart it or schedule a new one, and the other two istiod instances will continue to manage the mesh.
The Service object for istiod is also critical. It exposes istiod to the data plane, allowing the Envoy proxies to discover and connect to any available istiod instance. This Service typically uses a load balancer or is simply a ClusterIP that Envoy proxies can resolve.
A common configuration detail that gets overlooked is the readiness and liveness probes for the istiod pods. These probes, defined in the Kubernetes Deployment manifest, ensure that Kubernetes knows when an istiod instance is truly ready to serve traffic or if it needs to be restarted.
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz/live
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
These probes check specific HTTP endpoints exposed by istiod to determine its health. If a probe fails repeatedly, Kubernetes will take action.
When Istiod is running in HA, the Envoy proxies are configured to connect to the istiod Service. This Service, in turn, load balances connections across the available istiod pods. The Envoy configuration for discovery is typically managed by Istio’s control plane itself, ensuring that proxies are always aware of healthy Istiod instances.
The way Istiod handles configuration updates in an HA setup is also fascinating. When a new VirtualService or DestinationRule is applied, all healthy istiod replicas receive the update. They then independently compute the necessary Envoy configurations and push them to the relevant proxies. Because all istiod instances are working from the same source of truth (Kubernetes API, effectively), they will converge on the same configuration state, ensuring consistency across the mesh.
The "leader election" mechanism within Istiod itself, while not strictly for HA in the sense of multiple independent control planes, is important for certain internal operations. For example, only one istiod instance typically performs certain background tasks like certificate rotation or garbage collection to avoid redundant work. However, for the core function of serving configuration to Envoy proxies, any healthy istiod instance can participate.
The next concept you’ll likely encounter is how to effectively monitor the health and performance of your Istiod instances in HA mode, especially when dealing with large meshes or complex configurations.