Flux CD, by default, will happily apply your Kubernetes manifests even if the resources they create are unhealthy. This means your deployments could be stuck in ImagePullBackOff or services could be unready, but Flux won’t tell you until you manually check.
Here’s how to make Flux wait for your resources to be healthy before declaring success:
Making Flux Wait for Healthy Resources
Flux’s reconciliation loop is designed to be idempotent: it applies your desired state and moves on. To introduce a health check, we need to leverage Flux’s ability to monitor the status of Kubernetes resources after they’ve been applied.
The core mechanism for this is the health.toolkit.fluxcd.io API, specifically the HealthCheck custom resource. A HealthCheck tells Flux to monitor a specific Kubernetes resource (like a Deployment, StatefulSet, or DaemonSet) and report its health status. Flux then uses this health status as part of its overall reconciliation process.
1. Enabling the Health Controller
First, ensure the health controller is installed. It’s usually part of the standard Flux installation. You can verify this by checking for the flux-health-controller deployment in the flux-system namespace:
kubectl get deployment -n flux-system flux-health-controller
If it’s not there, you’ll need to install or upgrade your Flux components.
2. Creating a HealthCheck Resource
For each critical resource you want Flux to monitor for health, you’ll create a HealthCheck custom resource.
Example: Monitoring a Deployment
Let’s say you have a Deployment named my-app-deployment in the default namespace. You’d create a HealthCheck like this:
apiVersion: health.toolkit.fluxcd.io/v1alpha1
kind: HealthCheck
metadata:
name: my-app-deployment-health
namespace: default # Namespace of the resource to monitor
spec:
# The resource to monitor
resourceRef:
apiVersion: apps/v1
kind: Deployment
name: my-app-deployment
# How often to check the health (e.g., every 30 seconds)
interval: 30s
# How long to wait for the resource to become healthy before failing (e.g., 5 minutes)
timeout: 5m
# Optional: Define specific conditions for health
# This example uses the default checks for Deployments (e.g., available replicas)
Explanation of Fields:
resourceRef: This points to the Kubernetes resource you want to monitor. It requiresapiVersion,kind, andname.interval: How frequently Flux’s health controller should check the status of the referenced resource.30sis a common starting point.timeout: The maximum time Flux will wait for the resource to become healthy. If it exceeds this, theHealthCheckwill be marked as failed.5mis a reasonable default for most applications.
Common Resource Types and Their Health Checks:
- Deployments: Flux checks if
spec.replicas(desired) matchesstatus.availableReplicasandstatus.readyReplicas. - StatefulSets: Similar to Deployments, it checks for matching replicas.
- DaemonSets: Checks if
status.desiredNumberScheduledmatchesstatus.numberReady. - Services: Flux can check if a Service has endpoints. This is less common as Services themselves are usually healthy if they exist, but their backing Pods might not be.
- Custom Resources: You can define custom health checks for your own CRDs by implementing a
statussubresource with ahealthfield.
3. Integrating HealthChecks with Kustomizations
Now, you need to tell your Flux Kustomization to wait for these HealthCheck resources to succeed. This is done by referencing the HealthCheck in the Kustomization’s spec.dependsOn field.
Example Kustomization:
Assume your Kustomization is defined in clusters/my-cluster/flux-system/kustomization.yaml and it applies the my-app-deployment.
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: my-app
namespace: flux-system # Namespace of the Kustomization
spec:
interval: 10m
path: ./apps/my-app/overlays/production
prune: true
sourceRef:
kind: GitRepository
name: my-git-repo
validation: client # Or server, depending on your setup
# This is the key part: tell Flux to wait for the HealthCheck to pass
dependsOn:
- name: my-app-deployment-health # Name of the HealthCheck resource
namespace: default # Namespace of the HealthCheck resource
Explanation of dependsOn:
name: The name of theHealthCheckresource.namespace: The namespace where theHealthCheckresource is defined.
When Flux reconciles this Kustomization, it will first check the status of the my-app-deployment-health HealthCheck. If the HealthCheck is not yet Ready (meaning the my-app-deployment is not healthy within the timeout), the Kustomization’s reconciliation will be paused. Only when the HealthCheck becomes Ready will Flux proceed with marking the Kustomization as applied and healthy.
4. Troubleshooting HealthChecks
If your HealthCheck isn’t becoming Ready, you can inspect its status:
kubectl get healthcheck -n default my-app-deployment-health -o yaml
Look for the .status field. It will indicate Healthy: false and provide a message explaining why. Common reasons include:
- Deployment not scaling up: Check the Deployment’s Pods for errors (
ImagePullBackOff,CrashLoopBackOff). - Pods not becoming ready: Ensure containers are starting, passing readiness probes, and not crashing.
- Timeout exceeded: The
timeoutin theHealthCheckmight be too short for your application’s startup time. Increase it. - Incorrect
resourceRef: Double-check theapiVersion,kind, andnamein theHealthCheckto ensure they exactly match your resource.
5. The Next Problem: Service Availability
Once your Deployments are healthy, the next logical step is to ensure your Services can actually route traffic to those healthy Pods. This often involves checking Service endpoints. You might create a HealthCheck for a Service, but more commonly, you’ll rely on the health of the Pods behind the Service. If your application uses Ingress, you’ll also want to ensure the Ingress controller is healthy and the Ingress resource itself is correctly configured. The next challenge is often ensuring that external access to your services is functioning as expected.