Archive Fluent Bit Logs to S3 with Batching and Compression (2026)

Fluent Bit doesn’t just send logs; it shapes them.

Let’s say you’re sending application logs from a Kubernetes cluster to S3. You’ve got Fluent Bit running as a DaemonSet, collecting stdout and stderr from your pods.

# Fluent Bit DaemonSet manifest snippet
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
spec:
  template:
    spec:
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:latest
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: pods
          mountPath: /var/lib/docker/containers
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: pods
        hostPath:
          path: /var/lib/docker/containers

This setup grabs logs, but sending each tiny log line individually to S3 would be a performance nightmare and incredibly expensive. Fluent Bit’s power here is its ability to batch these logs and compress them before sending. This isn’t just about saving money; it’s about efficient data transfer.

Here’s how you configure Fluent Bit to do just that. We’ll use the [SERVICE], [INPUT], [FILTER], and [OUTPUT] sections of its fluent-bit.conf.

[SERVICE]
    Flush        5 # Send buffered data every 5 seconds
    Daemon       On
    Log_Level    info
    Parsers_File parsers.conf

[INPUT]
    Name         tail
    Path         /var/log/containers/*.log
    Parser       docker # Assuming standard Docker log format
    DB           /var/log/flb_kube.db
    Mem_Buf_Limit 5MB # Keep up to 5MB in memory before writing to DB

[FILTER]
    Name         kubernetes
    Match        kube.*
    Kube_URL     https://kubernetes.default.svc:443
    KubeCA_File  /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    KubeToken_File /var/run/secrets/kubernetes.io/serviceaccount/token
    Merge_Log    On
    Keep_Log     Off
    K8S-Logging.Parser On
    K8S-Logging.Parser_Name on
    K8S-Logging.Exclude Off

[OUTPUT]
    Name             s3
    Match            kube.*
    Region           us-east-1
    Bucket           your-s3-bucket-name
    # Keys to create subdirectories in S3
    # Example: logs/2023/10/27/your-app.log
    # Dynamicly generated based on current year, month, day and tag
    # Tag format is based on your input tag, which is 'kube.*' here
    # See https://docs.fluentbit.io/manual/pipeline/outputs/s3#key
    Key              ${tag}/${YYYY}/${MM}/${DD}/
    # Enable compression
    Compress         gzip
    # Batching configuration
    # How many records to accumulate before sending
    Upload_Watermark 1000
    # Max time in seconds to wait before sending if watermark is not reached
    Upload_Timeout   60
    # Max size of a single upload request in MB
    Upload_Chunk_Size 5M
    # Max size of a single file in S3 (Fluent Bit will split files if they exceed this)
    Upload_Max_File_Size 100M
    # Retry configuration
    Retry_Limit      3
    Auto_Create_Bucket On

Let’s break down what’s happening, focusing on the archiving aspect.

The [SERVICE] section sets the overall behavior. Flush 5 means Fluent Bit will try to send any buffered data every 5 seconds. This is a good balance between real-time delivery and batching efficiency.

The [INPUT] section uses the tail plugin to read log files from /var/log/containers/*.log. The Parser docker assumes your logs are in the standard Docker JSON format, which Fluent Bit can then parse. DB /var/log/flb_kube.db is crucial for reliability; it stores the last read position of each log file so Fluent Bit can resume from where it left off if it restarts. Mem_Buf_Limit 5MB means it will buffer up to 5MB in memory before flushing to disk (which then gets written to the DB).

The [FILTER] section, specifically the kubernetes filter, enriches your logs with Kubernetes metadata like pod name, namespace, and labels. This is invaluable for context when analyzing logs later. Merge_Log On combines log messages that belong to the same Kubernetes log entry.

Now, the [OUTPUT] section is where the S3 magic happens.

Name s3: Specifies the S3 output plugin.
Match kube.*: This ensures only logs tagged with kube.* (which is what the tail input and kubernetes filter typically produce) are sent to S3.
Region us-east-1 and Bucket your-s3-bucket-name: Standard S3 configuration.
Key ${tag}/${YYYY}/${MM}/${DD}/: This is how Fluent Bit structures your S3 objects. It creates a path like your-app-tag/2023/10/27/. ${tag} uses the tag assigned to the log record (derived from the input), ${YYYY}, ${MM}, and ${DD} are dynamic placeholders for the year, month, and day. This creates a logical, time-based organization in your S3 bucket.
Compress gzip: This tells Fluent Bit to compress the log data using gzip before uploading. This significantly reduces the amount of data transferred and stored in S3, saving costs.
Upload_Watermark 1000: This is a key batching parameter. Fluent Bit will collect up to 1000 log records in a buffer before it considers sending them.
Upload_Timeout 60: If the Upload_Watermark isn’t reached within 60 seconds, Fluent Bit will still send the buffered data. This prevents logs from being held indefinitely if the log volume is low.
Upload_Chunk_Size 5M: Each individual upload request to S3 will be at most 5MB. If the accumulated batch exceeds this, Fluent Bit will split it into multiple requests.
Upload_Max_File_Size 100M: This is another important setting for managing file sizes in S3. If a single compressed file (after batching) would exceed 100MB, Fluent Bit will automatically split it into multiple files, each adhering to this limit. This is good practice for managing large log archives and can improve download performance later.
Retry_Limit 3: If an upload fails, Fluent Bit will retry up to 3 times.
Auto_Create_Bucket On: Fluent Bit will attempt to create the S3 bucket if it doesn’t exist.

The real-time aspect comes from Flush 5 in the [SERVICE] section, while the batching and compression for archival efficiency are handled by the Upload_Watermark, Upload_Timeout, Compress, Upload_Chunk_Size, and Upload_Max_File_Size parameters in the [OUTPUT] section.

The next thing you’ll likely want to tackle is how to read these compressed archives efficiently, perhaps using Athena or Glue.