Fluentd on Kubernetes is surprisingly resilient, but its core strength lies in its ability to process and filter logs before they even hit your primary storage, not just act as a dumb pipe.

Let’s see it in action. Imagine you have a Kubernetes cluster and you want to collect logs from all your application pods, parse them, and send them to Elasticsearch.

First, you’ll need to add the Fluentd Helm repository.

helm repo add fluent https://fluent.github.io/helm-charts
helm repo update

Now, let’s install Fluentd using the official chart. We’ll configure it to send logs to a local Elasticsearch instance that’s also running in Kubernetes.

# fluentd-values.yaml
elasticsearch:
  enabled: true
  host: elasticsearch-master # Assuming your Elasticsearch service is named this
  port: 9200
  index: "kubernetes-logs-%Y.%m.%d"

serviceAccount:
  create: true
  name: fluentd

rbac:
  create: true

image:
  repository: fluent/fluentd-kubernetes-daemonset
  tag: v1.14.2-debian-elasticsearch7-1.1

resources:
  requests:
    cpu: 200m
    memory: 200Mi
  limits:
    cpu: 500m
    memory: 500Mi

Apply these values with your Helm installation:

helm install fluentd fluent/fluentd-kubernetes-daemonset -f fluentd-values.yaml

This command deploys Fluentd as a DaemonSet across your Kubernetes nodes. Each pod runs Fluentd and collects logs from containers on its respective node. The fluentd-values.yaml file configures Fluentd to output logs to an Elasticsearch instance. The elasticsearch.enabled: true setting tells the chart to deploy a basic Elasticsearch if one isn’t already present, or to connect to an existing one if elasticsearch.enabled is false and elasticsearch.host is provided.

The Fluentd DaemonSet pods will have privileged access to the host’s /var/log directory (or wherever your container runtime stores logs) and use tail input plugins to read new log entries. These entries are then parsed based on common formats or custom configurations. For example, if your application logs are in JSON format, Fluentd can automatically parse them.

Here’s a peek at the internal workings. The fluentd-kubernetes-daemonset chart deploys a specific configuration for Fluentd. The core of this configuration is a fluentd.conf file that defines input, filter, and output plugins.

# Example snippet from fluentd.conf within the chart
<source>
  @type tail
  path /var/log/containers/*.log
  pos /var/log/td-agent/pos/containers.pos
  tag kubernetes.*
  <parse>
    @type json
    time_key time
    time_format %Y-%m-%dT%H:%M:%S.%NZ
  </parse>
</source>

<filter kubernetes.**>
  @type kubernetes_metadata
</filter>

<match kubernetes.var.log.containers.**>
  @type elasticsearch
  host elasticsearch-master
  port 9200
  index_name kubernetes-logs-%Y.%m.%d
  flush_interval 5s
</match>

The tail source reads log files from /var/log/containers/. The tag kubernetes.* assigns a tag to these incoming logs. The parse directive tells Fluentd to expect JSON logs and how to parse the timestamp. The filter kubernetes_metadata plugin is crucial; it enriches each log record with Kubernetes metadata like pod name, namespace, labels, and container name. Finally, the match directive sends logs tagged with kubernetes.var.log.containers.** to Elasticsearch, indexing them with a daily index pattern.

The most surprising thing about Fluentd’s Kubernetes integration is how seamlessly it injects container and pod metadata without requiring any application-level instrumentation. It achieves this by reading the log files directly from the container runtime’s log directory (typically /var/log/containers/), which contain JSON objects that include the Kubernetes metadata as part of the log entry itself. Fluentd’s kubernetes_metadata filter then uses this embedded information to enrich the log record further, making it incredibly easy to search and analyze logs based on Kubernetes context.

Once deployed, you can verify logs are flowing by checking your Elasticsearch index. You’ll see documents appearing with fields like kubernetes.pod_name, kubernetes.namespace_name, and kubernetes.container_name, alongside your actual log messages.

The next logical step is to secure this log stream with TLS.

Want structured learning?

Take the full Fluentd course →