Fluentd on Kubernetes is failing because the fluentd DaemonSet’s service account lacks the necessary permissions to access Kubernetes API resources, or because its network ingress/egress is too restrictive.

Common Causes and Fixes

1. Missing get and list Permissions for Pods and Nodes

  • Diagnosis: The fluentd pods are likely crashing or not starting with errors like permission denied when trying to access /api/v1/pods or /api/v1/nodes. Check the fluentd pod logs: kubectl logs <fluentd-pod-name> -n <namespace>.

  • Cause: The fluentd service account, often named fluentd-ds, doesn’t have get and list permissions for pods and nodes resources. Fluentd needs this to discover which pods are running and where they are on which nodes to tail their logs.

  • Fix: Create or update a ClusterRole to grant these permissions and bind it to the fluentd service account.

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: fluentd-read-pods-nodes
    rules:
    - apiGroups: [""] # Core API group
      resources: ["pods", "nodes"]
      verbs: ["get", "list"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: fluentd-read-pods-nodes-binding
    subjects:
    - kind: ServiceAccount
      name: fluentd-ds # Replace with your fluentd service account name
      namespace: <namespace> # Replace with your fluentd namespace
    roleRef:
      kind: ClusterRole
      name: fluentd-read-pods-nodes
      apiGroup: rbac.authorization.k8s.io
    
  • Why it works: This grants the fluentd service account read-only access to pod and node information, allowing it to discover and monitor logs without being able to modify cluster state.

2. Missing Permissions for event Resources

  • Diagnosis: Fluentd might be missing events or failing to react to changes in the cluster, showing errors related to watching events.

  • Cause: Fluentd often watches Kubernetes events (e.g., pod restarts, scaling events) to enrich log data or trigger actions. The service account needs get and list permissions for events.

  • Fix: Add events to the resources list in the ClusterRole created in step 1.

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: fluentd-read-pods-nodes-events
    rules:
    - apiGroups: [""] # Core API group
      resources: ["pods", "nodes", "events"] # Added events
      verbs: ["get", "list"]
    

    Update the ClusterRoleBinding to reference this new ClusterRole.

  • Why it works: This allows Fluentd to subscribe to and retrieve Kubernetes event streams, providing context to the logs it collects.

3. Incorrectly Configured serviceAccountName in DaemonSet

  • Diagnosis: Fluentd pods are not starting, and their logs show error: no such file or directory or similar when trying to access /var/run/secrets/kubernetes.io/serviceaccount/token.

  • Cause: The fluentd DaemonSet manifest is not specifying the correct serviceAccountName, or it’s pointing to a non-existent service account.

  • Fix: Ensure your fluentd DaemonSet manifest has the serviceAccountName field correctly set to the name of the service account you’ve created or are using, and that this service account exists in the same namespace as the DaemonSet.

    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: fluentd-daemonset
      namespace: <namespace> # Replace with your fluentd namespace
    spec:
      template:
        spec:
          serviceAccountName: fluentd-ds # Must match your ServiceAccount object name
          # ... rest of your pod spec
    
  • Why it works: This explicitly tells Kubernetes which identity (service account) the fluentd pods should run as, ensuring they use the associated RBAC permissions.

4. NetworkPolicy Blocking Fluentd’s Egress to Aggregator

  • Diagnosis: Fluentd pods are running but not sending logs anywhere. Check the Fluentd logs for errors like connection refused or timeout when trying to connect to your log aggregation endpoint (e.g., Elasticsearch, Splunk, Kafka).

  • Cause: A NetworkPolicy in the fluentd namespace (or a NetworkPolicy targeting the fluentd pods from another namespace) is preventing egress traffic from the fluentd pods to the IP address and port of your log aggregator.

  • Fix: Create a NetworkPolicy that allows egress traffic from fluentd pods to your aggregator.

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-fluentd-egress
      namespace: <namespace> # Replace with your fluentd namespace
    spec:
      podSelector:
        matchLabels:
          app: fluentd # Or whatever label identifies your fluentd pods
      policyTypes:
      - Egress
      egress:
      - to:
        - ipBlock:
            cidr: <aggregator-ip>/32 # Replace with your aggregator's IP or range
        ports:
        - protocol: TCP
          port: 9200 # Replace with your aggregator's port (e.g., 9200 for Elasticsearch)
    
  • Why it works: This NetworkPolicy explicitly permits TCP connections from pods labeled app: fluentd to the specified IP and port, allowing logs to be forwarded.

5. NetworkPolicy Blocking Fluentd’s Ingress from API Server (Less Common for Logs, More for Control Plane)

  • Diagnosis: Fluentd pods might be unable to fetch information from the Kubernetes API server, leading to discovery issues or errors related to API access.

  • Cause: If you have a very strict ingress NetworkPolicy applied to fluentd pods, it might be blocking the necessary connections to the Kubernetes API server (usually on port 443).

  • Fix: Ensure your ingress NetworkPolicy for fluentd pods allows traffic from the kube-system namespace (where the API server usually resides) on port 443.

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-fluentd-ingress
      namespace: <namespace> # Replace with your fluentd namespace
    spec:
      podSelector:
        matchLabels:
          app: fluentd # Or whatever label identifies your fluentd pods
      policyTypes:
      - Ingress
      ingress:
      - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system # Or the namespace of your API server
        ports:
        - protocol: TCP
          port: 443
    
  • Why it works: This allows the fluentd pods to establish necessary connections to the Kubernetes API server for discovering cluster resources.

6. Missing Permissions for secrets (if using secrets for credentials)

  • Diagnosis: Fluentd pods fail to start or connect to external services (like cloud logging endpoints) with errors like unauthorized or credentials not found.

  • Cause: If your Fluentd configuration relies on Kubernetes Secrets for API keys, passwords, or certificates, the fluentd service account needs get permissions for secrets.

  • Fix: Add secrets to the resources and verbs in your ClusterRole.

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: fluentd-read-pods-nodes-events-secrets
    rules:
    - apiGroups: [""] # Core API group
      resources: ["pods", "nodes", "events", "secrets"] # Added secrets
      verbs: ["get", "list"]
    

    Update the ClusterRoleBinding to reference this new ClusterRole.

  • Why it works: This allows Fluentd to retrieve sensitive information stored in Kubernetes Secrets, which it can then use to authenticate with external services.

The next error you’ll likely encounter after fixing RBAC and network issues is a misconfiguration in Fluentd’s output plugin, leading to Ruby exception occurred: ... errors in the pod logs when attempting to send data.

Want structured learning?

Take the full Fluentd course →