Flux, the GitOps tool, is actually a lot like a smart, opinionated Git client that automates deployments.

Let’s see it in action. Imagine you have a Kubernetes cluster and a Git repository containing your application manifests. Flux watches this repository. When you git push a change, Flux detects it, pulls the new manifests, and applies them to your cluster. It’s not just about applying changes; Flux continuously reconciles the desired state in Git with the actual state in your cluster, ensuring they always match.

Here’s a simplified look at the core components:

  • Source Controller: This is the Git watcher. It pulls changes from sources like Git repositories, Helm repositories, or S3 buckets.
  • Kustomize Controller: Applies Kubernetes manifests defined using Kustomize.
  • Helm Controller: Manages Helm chart releases.
  • Notification Controller: Handles notifications about reconciliation events, sending them to Slack, Teams, or webhooks.
  • Image Reflector & Automation Controllers: These are for more advanced scenarios where Flux can automatically update container image tags in your Git repository based on new images pushed to a registry.

To build a Flux monitoring dashboard in Grafana, you’ll need Prometheus to scrape metrics from your Flux components and Grafana to visualize them.

First, ensure you have Prometheus and Grafana deployed in your cluster. If not, you can install them using Helm:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus --namespace monitoring --create-namespace

helm repo add grafana https://grafana.github.io/helm-install
helm install grafana grafana/grafana --namespace monitoring --create-namespace

Next, you need to expose Flux’s metrics. Flux components expose metrics via an HTTP endpoint, typically on port 8080 or 9090, depending on the component. Prometheus needs to be configured to scrape these endpoints.

If you installed Flux using its recommended Helm chart, the Prometheus configuration might already be partially set up. You can verify this by checking your Flux installation’s values.yaml or by inspecting the Prometheus ServiceMonitor resources in your cluster.

A ServiceMonitor is a Custom Resource Definition (CRD) that tells Prometheus how to discover and scrape endpoints. Here’s an example of what a ServiceMonitor for Flux components might look like:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: flux-system
  namespace: flux-system # Or wherever your flux components are running
  labels:
    release: prometheus # This label should match your Prometheus operator's configuration
spec:
  selector:
    matchLabels:
      app: flux # Label applied to Flux components
  namespaceSelector:
    matchNames:
      - flux-system # Namespace where Flux is installed
  endpoints:
  - port: http-metrics
    interval: 30s
    path: /metrics
  - port: http-infra
    interval: 30s
    path: /metrics

Apply this ServiceMonitor to your cluster if one doesn’t already exist that covers your Flux components. Ensure the app: flux label (or whatever label your Flux components have) and the namespace match your Flux installation.

Once Prometheus is scraping Flux metrics, you can import a pre-built Grafana dashboard or create your own. A good starting point is the official Flux dashboard available on Grafana.com. Search for "Flux" on the Grafana Dashboards page.

Importing a dashboard:

  1. Log in to your Grafana instance.
  2. Go to Dashboards -> Import.
  3. Enter the dashboard ID (e.g., 14008 for the Flux dashboard) or upload the JSON file.
  4. Select your Prometheus data source.
  5. Click Import.

This dashboard will provide insights into:

  • Reconciliation Status: Are your Git sources, Kustomizations, and HelmReleases reconciling successfully?
  • Error Rates: How often are components failing to reconcile?
  • Resource Utilization: CPU and memory usage of Flux components.
  • Git Commit Information: Which commit is currently applied to your cluster?

The most surprising thing about Flux metrics is how granular they are about the state of reconciliation. It’s not just a binary "success/fail" but detailed timings, counts of applied objects, and specific phases of the reconciliation loop, which is crucial for understanding why something might be stuck or slow.

By examining metrics like source_reconciliation_duration_seconds, kustomization_reconciliation_errors_total, and helmrelease_reconciliation_total, you can quickly pinpoint issues. For instance, a consistently high source_reconciliation_duration_seconds might indicate network latency to your Git provider or a very large repository. A sudden spike in kustomization_reconciliation_errors_total points to a problematic manifest change.

When setting up your own dashboard panels, focus on the flux_reconciler_state metric. This metric, often reported as 1 for success and 0 for failure, can be used with Prometheus’s rate() or sum() functions to build alerts and gauges showing the health of specific Flux controllers (Source, Kustomize, Helm). For example, sum(rate(flux_reconciler_state{job="kustomize-controller"}[5m])) will give you the number of successful kustomization reconciliations per second over the last 5 minutes.

The next step in mastering Flux observability is setting up alerting based on these metrics.

Want structured learning?

Take the full Flux course →