The primary difference isn’t about where data is stored, but when and how it’s acknowledged.
Let’s see Fluent Bit in action. Imagine you’re collecting logs from a busy web server and sending them to a remote Elasticsearch cluster.
[SERVICE]
Flush 5
Daemon On
Log_Level info
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/nginx/access.log
Tag nginx.access
Mem_Buf_Limit 10MB
[OUTPUT]
Name es
Match nginx.access
Host elasticsearch.example.com
Port 9200
Logstash_Format On
Replace_Dots On
Retry_Limit 5
In this setup, Fluent Bit reads access.log. When it has collected enough data (determined by Flush or internal buffer limits), it attempts to send it to Elasticsearch. The crucial point is what happens after the data is sent.
Memory Buffer (Mem_Buf_Limit)
When you configure Mem_Buf_Limit, Fluent Bit buffers incoming log records in RAM. Once a chunk of data is successfully sent to the output destination (e.g., Elasticsearch), Fluent Bit discards that data from its memory buffer.
- Why it matters: This is the fastest path. Acknowledgement from the output is immediate, and the input can proceed. If the output is reliable and consistently available, this is highly efficient.
- Diagnosis: There’s no direct command to "see" the memory buffer. You infer its use by its presence in the configuration and observe the system’s behavior. High throughput, low latency.
- Fix/Tune:
- Increase
Mem_Buf_Limit: If you’re seeingwritev error: broken pipeor similar network-related output errors indicating Fluent Bit is struggling to keep up, increasing this value can help smooth out bursts. For example,Mem_Buf_Limit 50MB. This gives Fluent Bit more headroom to accumulate data before sending, reducing the frequency of network operations and allowing it to absorb temporary output backpressure. - Decrease
Mem_Buf_Limit: If your system has very limited RAM, a largeMem_Buf_Limitcan lead to OOM (Out Of Memory) killer situations. Reducing it, e.g.,Mem_Buf_Limit 1MB, will force more frequent, smaller writes, but consume less memory. - Tune
Flush: A smallerFlushinterval (e.g.,Flush 1) means data is sent more frequently, reducing the amount of data that needs to be buffered in memory at any given time.
- Increase
- The Trade-off: The primary risk is data loss. If Fluent Bit crashes or the system shuts down before data is successfully sent and acknowledged by the output, any data still in the memory buffer is gone forever.
Filesystem Buffer (storage.path and storage.sync)
When you configure storage.path (e.g., storage.path /var/fluentbit/buffer/), Fluent Bit writes incoming log records to disk. Crucially, Fluent Bit does not discard the data from the disk buffer until it receives confirmation from the output plugin that the data has been successfully ingested by the destination.
- Why it matters: This provides durability. Even if Fluent Bit crashes, restarts, or the output is temporarily unavailable, the data is preserved on disk and will be retried later.
- Diagnosis: Check the directory specified by
storage.pathfor.tmpand.donefiles. The presence of numerous files, especially.tmpfiles that aren’t being cleared, indicates data is being buffered to disk. You might also see logs likeBuffer storage: write successorBuffer storage: read success. - Fix/Tune:
- Ensure
storage.pathis on a reliable disk: If you’re usingstorage.path, make sure it’s pointed to an SSD or a disk that isn’t prone to corruption or failure.storage.path /mnt/fast_ssd/fluentbit_buffer/. - Configure
storage.sync:storage.sync Onensures that data is physically written to disk before Fluent Bit acknowledges the write operation to the input. This provides maximum durability but can impact performance.storage.sync Offrelies on the OS’s page cache, which is faster but carries a slight risk of data loss if the system crashes before the cache is flushed to disk. - Monitor disk space: The filesystem buffer can grow very large if the output is consistently failing or if the disk is too slow. Monitor disk usage and ensure sufficient space. If
storage.pathis/var/log/fluentbit_buffer, ensure you have ample free space there. - Tune
storage.checkpoint.interval: This setting controls how often Fluent Bit flushes its internal state (which buffers are in progress, etc.) to disk. A shorter interval (e.g.,storage.checkpoint.interval 30000) means more frequent state saves, reducing the amount of re-processing needed after a crash but slightly increasing disk I/O.
- Ensure
- The Trade-off: Performance. Disk I/O is inherently slower than memory operations. If your output is consistently fast and reliable, using the filesystem buffer adds unnecessary latency and overhead.
The fundamental difference lies in the persistence guarantee. Memory buffering offers speed at the cost of potential data loss during unexpected shutdowns. Filesystem buffering offers durability at the cost of performance. The choice depends on your tolerance for data loss versus your performance requirements.
A common pitfall is enabling filesystem buffering without dedicating sufficient disk space, leading to Fluent Bit stopping ingestion because the buffer disk is full.