Fluentd on Kubernetes is failing because the fluentd DaemonSet’s service account lacks the necessary permissions to access Kubernetes API resources, or because its network ingress/egress is too restrictive.
Common Causes and Fixes
1. Missing get and list Permissions for Pods and Nodes
-
Diagnosis: The
fluentdpods are likely crashing or not starting with errors likepermission deniedwhen trying to access/api/v1/podsor/api/v1/nodes. Check thefluentdpod logs:kubectl logs <fluentd-pod-name> -n <namespace>. -
Cause: The
fluentdservice account, often namedfluentd-ds, doesn’t havegetandlistpermissions forpodsandnodesresources. Fluentd needs this to discover which pods are running and where they are on which nodes to tail their logs. -
Fix: Create or update a
ClusterRoleto grant these permissions and bind it to thefluentdservice account.apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: fluentd-read-pods-nodes rules: - apiGroups: [""] # Core API group resources: ["pods", "nodes"] verbs: ["get", "list"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: fluentd-read-pods-nodes-binding subjects: - kind: ServiceAccount name: fluentd-ds # Replace with your fluentd service account name namespace: <namespace> # Replace with your fluentd namespace roleRef: kind: ClusterRole name: fluentd-read-pods-nodes apiGroup: rbac.authorization.k8s.io -
Why it works: This grants the
fluentdservice account read-only access to pod and node information, allowing it to discover and monitor logs without being able to modify cluster state.
2. Missing Permissions for event Resources
-
Diagnosis: Fluentd might be missing events or failing to react to changes in the cluster, showing errors related to watching
events. -
Cause: Fluentd often watches Kubernetes events (e.g., pod restarts, scaling events) to enrich log data or trigger actions. The service account needs
getandlistpermissions forevents. -
Fix: Add
eventsto theresourceslist in theClusterRolecreated in step 1.apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: fluentd-read-pods-nodes-events rules: - apiGroups: [""] # Core API group resources: ["pods", "nodes", "events"] # Added events verbs: ["get", "list"]Update the
ClusterRoleBindingto reference this newClusterRole. -
Why it works: This allows Fluentd to subscribe to and retrieve Kubernetes event streams, providing context to the logs it collects.
3. Incorrectly Configured serviceAccountName in DaemonSet
-
Diagnosis: Fluentd pods are not starting, and their logs show
error: no such file or directoryor similar when trying to access/var/run/secrets/kubernetes.io/serviceaccount/token. -
Cause: The
fluentdDaemonSet manifest is not specifying the correctserviceAccountName, or it’s pointing to a non-existent service account. -
Fix: Ensure your
fluentdDaemonSet manifest has theserviceAccountNamefield correctly set to the name of the service account you’ve created or are using, and that this service account exists in the same namespace as the DaemonSet.apiVersion: apps/v1 kind: DaemonSet metadata: name: fluentd-daemonset namespace: <namespace> # Replace with your fluentd namespace spec: template: spec: serviceAccountName: fluentd-ds # Must match your ServiceAccount object name # ... rest of your pod spec -
Why it works: This explicitly tells Kubernetes which identity (service account) the
fluentdpods should run as, ensuring they use the associated RBAC permissions.
4. NetworkPolicy Blocking Fluentd’s Egress to Aggregator
-
Diagnosis: Fluentd pods are running but not sending logs anywhere. Check the Fluentd logs for errors like
connection refusedortimeoutwhen trying to connect to your log aggregation endpoint (e.g., Elasticsearch, Splunk, Kafka). -
Cause: A
NetworkPolicyin thefluentdnamespace (or aNetworkPolicytargeting thefluentdpods from another namespace) is preventing egress traffic from thefluentdpods to the IP address and port of your log aggregator. -
Fix: Create a
NetworkPolicythat allows egress traffic fromfluentdpods to your aggregator.apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-fluentd-egress namespace: <namespace> # Replace with your fluentd namespace spec: podSelector: matchLabels: app: fluentd # Or whatever label identifies your fluentd pods policyTypes: - Egress egress: - to: - ipBlock: cidr: <aggregator-ip>/32 # Replace with your aggregator's IP or range ports: - protocol: TCP port: 9200 # Replace with your aggregator's port (e.g., 9200 for Elasticsearch) -
Why it works: This
NetworkPolicyexplicitly permits TCP connections from pods labeledapp: fluentdto the specified IP and port, allowing logs to be forwarded.
5. NetworkPolicy Blocking Fluentd’s Ingress from API Server (Less Common for Logs, More for Control Plane)
-
Diagnosis: Fluentd pods might be unable to fetch information from the Kubernetes API server, leading to discovery issues or errors related to API access.
-
Cause: If you have a very strict ingress
NetworkPolicyapplied tofluentdpods, it might be blocking the necessary connections to the Kubernetes API server (usually on port 443). -
Fix: Ensure your ingress
NetworkPolicyforfluentdpods allows traffic from thekube-systemnamespace (where the API server usually resides) on port 443.apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-fluentd-ingress namespace: <namespace> # Replace with your fluentd namespace spec: podSelector: matchLabels: app: fluentd # Or whatever label identifies your fluentd pods policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system # Or the namespace of your API server ports: - protocol: TCP port: 443 -
Why it works: This allows the
fluentdpods to establish necessary connections to the Kubernetes API server for discovering cluster resources.
6. Missing Permissions for secrets (if using secrets for credentials)
-
Diagnosis: Fluentd pods fail to start or connect to external services (like cloud logging endpoints) with errors like
unauthorizedorcredentials not found. -
Cause: If your Fluentd configuration relies on Kubernetes
Secretsfor API keys, passwords, or certificates, thefluentdservice account needsgetpermissions forsecrets. -
Fix: Add
secretsto theresourcesandverbsin yourClusterRole.apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: fluentd-read-pods-nodes-events-secrets rules: - apiGroups: [""] # Core API group resources: ["pods", "nodes", "events", "secrets"] # Added secrets verbs: ["get", "list"]Update the
ClusterRoleBindingto reference this newClusterRole. -
Why it works: This allows Fluentd to retrieve sensitive information stored in Kubernetes Secrets, which it can then use to authenticate with external services.
The next error you’ll likely encounter after fixing RBAC and network issues is a misconfiguration in Fluentd’s output plugin, leading to Ruby exception occurred: ... errors in the pod logs when attempting to send data.