Fluentd as a Kubernetes DaemonSet is the go-to for collecting logs from every node in your cluster.

Here’s Fluentd in action, collecting logs from two different pods on a single Kubernetes node and sending them to a mock Elasticsearch endpoint.

# On a Kubernetes node where Fluentd is running as a DaemonSet,
# we can tail the output of the Fluentd pod.
# First, find the Fluentd pod name on that node:
kubectl get pods -o wide | grep fluentd

# Let's assume the pod name is fluentd-abcdef-12345
# Now, tail its logs:
kubectl logs -f fluentd-abcdef-12345

# While tailing, let's generate some logs from another pod.
# Create a simple pod that prints to stdout:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: log-generator
spec:
  containers:
  - name: generator
    image: alpine
    command: ["/bin/sh", "-c", "i=0; while true; do echo \"Log entry $i at $(date)\"; i=$((i+1)); sleep 5; done"]
EOF

# You'll start seeing "Log entry X at ..." appearing in the Fluentd logs you're tailing.
# This is Fluentd capturing the stdout of the log-generator pod and processing it.
# The output in the Fluentd logs will look something like this, depending on your config:
# 2023-10-27 10:30:00 +0000 <stdout>: {"log":"Log entry 10 at Fri Oct 27 10:30:00 UTC 2023","stream":"stdout","time":"2023-10-27T10:30:00.123456789Z","kubernetes":{"container_name":"generator","pod_name":"log-generator","namespace_name":"default","labels":{"pod-template-hash":"abcdef"},"host":"your-node-name"}}

The problem Fluentd solves in Kubernetes is log aggregation at scale. Without it, you’d have to SSH into each node, find the container logs (which are typically stored in /var/log/pods or a similar path managed by the container runtime), and manually piece together the history. Fluentd, running as a DaemonSet, ensures a copy of every log from every pod on every node is sent to a centralized location.

Internally, Fluentd works as a pipeline. It has input plugins that read logs from various sources, filter plugins that transform or enrich the logs, and output plugins that send them to destinations. When deployed as a Kubernetes DaemonSet, the primary input is often the container runtime’s log files (e.g., Docker’s JSON logs), which Fluentd tail’s or reads. The kubernetes metadata plugin is crucial here; it automatically enriches each log record with Kubernetes-specific information like pod name, namespace, labels, and the node it originated from. This metadata is essential for filtering and searching logs effectively in your central logging system.

The fluentd.conf (or equivalent configuration) dictates this flow. A typical DaemonSet configuration might look like this:

# fluentd.conf example snippet for Kubernetes DaemonSet

# In-memory buffer is generally not recommended for production
# but useful for quick testing. For production, use file or memory_buffer
# with appropriate persistence.
<buffer tag>
  @type memory
  flush_interval 5s
  retry_max_times 5
  retry_wait 1s
</buffer>

# This input tail's container logs using the file plugin.
# The path assumes Docker's log driver, adjust if using another.
# The <parse> directive tells Fluentd how to interpret the log lines.
<source>
  @type tail
  path /var/log/containers/*.log # Path where container logs are typically found
  pos_file /var/log/fluentd-containers.log.pos # State file to resume tailing
  tag kubernetes.* # Tag for routing logs
  <parse>
    @type json # Container logs are usually in JSON format
    time_key time
    time_format %Y-%m-%dT%H:%M:%S.%NZ
  </parse>
</source>

# The kubernetes metadata plugin enriches logs with Kubernetes context
<filter kubernetes.**>
  @type kubernetes_metadata
  # This cache is important for performance. Adjust size based on your cluster size.
  cache_size 1000
  cache_ttl 1h
</filter>

# Example output to Elasticsearch
<match kubernetes.**>
  @type elasticsearch
  host elasticsearch.logging.svc.cluster.local # Your Elasticsearch host
  port 9200
  logstash_format true
  logstash_prefix kubernetes-logs # Index prefix in Elasticsearch
  include_tag_key true
  tag_key @log_name
  flush_interval 5s
</match>

The path /var/log/containers/*.log in the tail input plugin is a common pattern, but the exact location can vary based on your Kubernetes cluster’s container runtime configuration and the node’s operating system. For instance, on some systems or with different runtimes, you might see logs in /var/lib/docker/containers/<container-id>/<container-id>-json.log or managed by journald. The pos_file is crucial for ensuring that Fluentd doesn’t re-read logs it has already processed after a restart or node reboot; it stores the last read position for each log file.

The kubernetes_metadata filter is where the magic happens for Kubernetes observability. It queries the Kubernetes API server to fetch details about the pod producing the log (like labels, annotations, namespace, etc.) and attaches them to each log record. This allows you to, for example, filter logs by deployment name, service, or any label you’ve applied to your pods. The cache_size and cache_ttl parameters are vital for performance; without them, Fluentd would repeatedly query the API server for the same pod’s metadata, quickly overwhelming the API and slowing down log processing.

When you configure Fluentd to output to a system like Elasticsearch, the logstash_format true and logstash_prefix kubernetes-logs directives ensure your logs are indexed in a structured, searchable way, commonly with daily indices like kubernetes-logs-2023.10.27. The include_tag_key true and tag_key @log_name add the Fluentd tag (which we set to kubernetes.* in the source) as a field in the output document, further aiding in log routing and analysis.

One detail that often trips people up is the tag directive in the <source> block. This tag is not the same as a Kubernetes label or tag; it’s an internal routing mechanism within Fluentd. Logs matching a specific source’s tag (e.g., kubernetes.*) are then processed by filters and outputs that also match that tag (e.g., <filter kubernetes.**> and <match kubernetes.**>). This allows you to have different processing pipelines for different types of logs within the same Fluentd instance.

The next conceptual hurdle is handling log rotation and ensuring Fluentd can keep up with high log volumes without dropping messages.

Want structured learning?

Take the full Fluentd course →