Fluentd’s buffer mechanism is the unsung hero of reliable log collection, but choosing between its memory and file-backed options can feel like a coin flip.

Let’s see what happens when Fluentd is drowning in logs. Imagine a busy web server spitting out thousands of requests per second. Fluentd’s in_tail plugin is happily tailing its access logs, and the out_forward plugin is trying to push these logs to a remote Elasticsearch cluster.

<source>
  @type tail
  path /var/log/nginx/access.log
  pos_file /var/log/td-agent/nginx-access.log.pos
  tag nginx.access
  <parse>
    @type nginx
  </parse>
</source>

<match nginx.access>
  @type forward
  <server>
    host logging.example.com
    port 24224
  </server>
  buffer_type memory
  buffer_queue_limit 5000
  buffer_chunk_limit 8m
  flush_interval 5s
</match>

In this setup, we’re using buffer_type memory. When the logging.example.com server is slow to respond, or temporarily unavailable, Fluentd’s memory buffer starts to fill up. It’s like a temporary holding pen for log events. If the backlog grows faster than Fluentd can send it, and the buffer_queue_limit of 5000 events is hit, Fluentd will start dropping logs. You’ll see messages like:

2023-10-27 10:30:00 +0000 [warn]: buffer is full, dropping event. {"message": "192.168.1.10 - - [27/Oct/2023:10:30:00 +0000] \"GET /api/users HTTP/1.1\" 200 1234 \"-\" \"curl/7.68.0\""}

The problem isn’t that Fluentd can’t process the logs; it’s that the outbound connection is the bottleneck. The memory buffer is fast, but it has a finite capacity. When that capacity is exceeded, and the downstream is still lagging, the only option is to discard incoming data. This is a critical failure mode: data loss.

The core problem Fluentd’s buffering solves is bridging the speed mismatch between log producers and log consumers. It acts as a shock absorber. When logs arrive faster than they can be sent to their destination, the buffer holds them.

There are two primary buffer types: memory and file.

Memory Buffer:

  • How it works: Logs are stored directly in RAM. This is the fastest option because memory access is significantly quicker than disk I/O.
  • Pros: High throughput, low latency for writes.
  • Cons: Data loss on Fluentd process restart or system crash. Limited by available RAM.
  • Configuration:
    buffer_type memory
    buffer_queue_limit 10000  # Max number of chunks in the queue
    buffer_chunk_limit 8m     # Max size of a single chunk
    flush_interval 5s         # How often to try flushing
    
  • When to use: For non-critical logs where occasional loss is acceptable, or when the downstream consumer is guaranteed to be highly available and responsive. Think ephemeral application logs for debugging that don’t need to be permanently archived.

File Buffer:

  • How it works: Logs are written to disk (in a specified directory) before being sent to the destination. This provides durability.
  • Pros: Data is durable across Fluentd restarts and system crashes. Can handle larger volumes of data than memory buffers if disk space is available.
  • Cons: Slower write performance due to disk I/O. Can consume significant disk space.
  • Configuration:
    buffer_type file
    buffer_path /var/log/td-agent/buffer/nginx_access.buffer # Directory and prefix for buffer files
    buffer_queue_limit 10000
    buffer_chunk_limit 8m
    flush_interval 5s
    
  • When to use: For critical logs that must not be lost, such as security events, financial transactions, or production metrics. This is the default and generally recommended choice for most production environments.

Choosing the Right One:

The decision hinges on your durability requirements.

  • Need guaranteed delivery? Use file buffer.
  • Can tolerate some loss for speed? Use memory buffer.

Consider the reliability of your downstream destination. If your Elasticsearch cluster or other log aggregator is prone to downtime, a file buffer is your safety net. If it’s rock-solid and you’re pushing logs to a very fast, always-available endpoint, memory buffer might offer a slight performance edge.

The buffer_path for the file buffer is crucial. Ensure the directory exists and Fluentd has write permissions. For example, if you specify /var/log/td-agent/buffer/nginx_access.buffer, Fluentd will create files like /var/log/td-agent/buffer/nginx_access.buffer.1678886400.123456 where the numbers represent time and a unique ID.

When using file buffer, the flush_mode is also important. The default is lazy, which means Fluentd waits until a chunk is full or flush_interval is reached before attempting to flush. immediate flushes chunks as soon as they are written, which can increase I/O but might reduce latency for very critical, low-volume logs. However, immediate can also lead to excessive disk writes if logs are coming in very rapidly.

The most surprising thing about Fluentd’s buffering is how it handles flush_mode with file buffers. While lazy is the default and generally best for performance, immediate offers a trade-off. When flush_mode is set to immediate, Fluentd will attempt to send data from a buffer chunk to the output as soon as the first event is added to that chunk, rather than waiting for the chunk to fill or the flush_interval to pass. This can reduce the latency for individual log events, making them appear at the destination faster. However, it dramatically increases the number of disk I/O operations and network connection attempts if logs are sparse, potentially overwhelming the system or the destination. It’s a subtle but powerful lever for tuning, often overlooked in favor of just picking file or memory.

Once you’ve mastered buffering, you’ll likely encounter issues with the output plugin itself, such as connection refused errors from your Elasticsearch cluster.

Want structured learning?

Take the full Fluentd course →