Kubernetes Pods can generate a lot of logs, and without context, they’re just a wall of text. This article shows you how to automatically add namespace and label metadata to your pod logs, making them infinitely more searchable and debuggable.

Let’s see this in action. Imagine you have a simple Nginx deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-app
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
        environment: staging
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

When Nginx logs its startup message, it looks like this by default:

2023/10/27 10:30:00 [emerg] 1#1: unknown directive "daemon" in /etc/nginx/nginx.conf:3

After enrichment, the same log line might look like this (depending on your logging agent configuration):

{
  "log": "2023/10/27 10:30:00 [emerg] 1#1: unknown directive \"daemon\" in /etc/nginx/nginx.conf:3\n",
  "stream": "stdout",
  "time": "2023-10-27T10:30:00.123456789Z",
  "kubernetes": {
    "namespace_name": "default",
    "pod_name": "nginx-app-abcdef-12345",
    "container_name": "nginx",
    "labels": {
      "app": "nginx",
      "environment": "staging"
    }
  }
}

Notice how namespace_name, pod_name, container_name, and labels are now part of the log record. This is typically handled by a cluster-level logging agent, like Fluentd, Fluent Bit, or the Vector agent, often deployed as a DaemonSet on each node.

The core idea is that the logging agent running on the node inspects the running containers and their associated Kubernetes metadata. It then "tags" or "appends" this metadata to the log records it collects before sending them to your central logging backend (like Elasticsearch, Loki, or Splunk).

Here’s how you’d typically configure this with Fluent Bit, a popular choice for Kubernetes logging. You’d have a DaemonSet with a fluent-bit.conf that includes an input plugin (e.g., tail for log files) and an output plugin (e.g., es for Elasticsearch). The magic happens in a filter plugin.

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: kube-system
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush        5
        Daemon       Off
        Log_Level    info
        Parsers_File parsers.conf

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-elasticsearch.conf

  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buffer_Limit  10MB
        Skip_Long_Lines   On
        Refresh_Interval  10

  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
        Labels              On
        Annotations         Off

  output-elasticsearch.conf: |
    [OUTPUT]
        Name            es
        Match           kube.*
        Host            elasticsearch.logging.svc.cluster.local
        Port            9200
        Logstash_Format On
        Replace_Dots    On
        Retry_Limit     False

In this filter-kubernetes.conf:

  • Name kubernetes: This tells Fluent Bit to use its built-in Kubernetes filter.
  • Match kube.*: Applies this filter to logs tagged with kube.*, which our tail input does.
  • Kube_URL, Kube_CA_File, Kube_Token_File: These point to the Kubernetes API server, allowing the agent to fetch metadata for the pods whose logs it’s reading. The agent runs with a Service Account that has permissions to query the API.
  • Labels On: This is the key setting that tells the filter to append Kubernetes labels associated with the pod.
  • Annotations Off: We’re not enriching with annotations in this example, but you could turn this On too.
  • Merge_Log On: This is often used to combine the original log line with the enriched metadata into a single field, making it easier to parse by your backend.

The most surprising thing about this process is how the logging agent, running as a DaemonSet on each Kubernetes node, acts as a proxy for the Kubernetes API. It doesn’t just read log files; it actively queries the API server for each log stream it handles to get the pod’s name, namespace, and associated labels and annotations. This allows it to enrich logs even if the pod’s metadata changes after the log line was generated but before the log line is processed.

Once you have this set up, you can query your logs using your backend’s language. For example, in Elasticsearch using Kibana, you could search for all logs from pods with app: nginx in the staging environment:

kubernetes.labels.app: "nginx" AND kubernetes.labels.environment: "staging"

This turns a chaotic stream of logs into a structured, searchable dataset, allowing you to quickly pinpoint issues within specific applications and environments.

The next step you’ll likely encounter is handling multiline logs, like stack traces, and ensuring consistent log formatting across all your applications.

Want structured learning?

Take the full Fluentbit course →