Configure Fluentd Buffer Chunk Flush and Retry Policies (2026)

Fluentd’s buffer and retry mechanisms are the unsung heroes of reliable log aggregation, preventing data loss even when downstream systems hiccup.

Here’s Fluentd’s buffer in action, processing a stream of logs and periodically flushing them to a destination. Imagine we’re collecting Nginx access logs and sending them to Elasticsearch.

<source>
  @type tail
  path /var/log/nginx/access.log
  pos /var/log/td-agent/nginx-access.log.pos
  tag nginx.access
  <parse>
    @type nginx
  </parse>
</source>

<match nginx.access>
  @type elasticsearch
  host localhost
  port 9200
  logstash_format true
  logstash_prefix nginx-access
  include_tag_key true
  tag_key @log_name
  flush_interval 5s
  <buffer>
    @type file
    path /var/log/td-agent/buffer/nginx
    flush_mode interval
    retry_max_times 10
    retry_wait 1s
    chunk_limit_size 2m
    chunk_limit_records 1000
  </buffer>
</match>

The core problem Fluentd solves is guaranteeing delivery of log data in the face of network instability or destination service outages. It does this by decoupling log ingestion from log forwarding. When the elasticsearch output plugin can’t connect to Elasticsearch, Fluentd doesn’t drop the logs; it buffers them.

The <buffer> section in the configuration is where this magic happens.

@type file: This specifies that the buffer will be stored on disk. Other options include memory (faster but data is lost on restart) or custom types. Using file is the most common for durability.
path /var/log/td-agent/buffer/nginx: This is the directory where Fluentd will write its buffer files (chunks). Ensure this directory exists and is writable by the td-agent user.
flush_mode interval: This dictates when Fluentd attempts to send buffered data. interval means it tries every flush_interval seconds (defined in the <match> block, here 5s). Other modes include lazy (flushes only when a chunk is full or a certain time has passed since the last flush) and immediate (attempts to flush each record as it arrives, less common for high-throughput scenarios).
retry_max_times 10: If a flush operation fails, Fluentd will retry sending that chunk up to 10 times.
retry_wait 1s: After a failed flush, Fluentd will wait 1 second before retrying. This prevents overwhelming a struggling destination.
chunk_limit_size 2m: Each buffer chunk written to disk will not exceed 2 megabytes. Once a chunk reaches this size, Fluentd will start a new one.
chunk_limit_records 1000: Alternatively, a chunk can be considered "full" if it contains 1000 records, regardless of its total size. Fluentd will flush when either chunk_limit_size or chunk_limit_records is met, or when flush_interval elapses.

When Fluentd tries to flush a chunk and the destination (Elasticsearch, in this case) is unavailable, the chunk is not deleted. Instead, it’s marked for retry, and Fluentd waits for retry_wait seconds before attempting to send it again, up to retry_max_times. If all retries fail, the chunk is eventually dropped (or sent to a dead-letter queue if configured), but only after exhausting all retry attempts.

The interplay between flush_interval, chunk_limit_size, and chunk_limit_records is crucial for tuning. A short flush_interval with small chunk sizes means more frequent, smaller flushes, leading to lower latency but potentially higher overhead on the destination. Larger chunks and longer intervals reduce overhead but increase latency and the amount of data buffered during an outage.

One common pitfall is not setting retry_wait and retry_max_times appropriately. If retry_wait is too short (e.g., 0s) and retry_max_times is high, Fluentd can thrash a failing service, making the problem worse. A common starting point is retry_wait 5s and retry_max_times 5.

The next challenge you’ll likely face is handling situations where the destination is permanently unavailable or data corruption occurs, leading to persistent flush failures.