Fluent Bit can get overwhelmed and drop data when downstream services can’t keep up with its ingestion rate.

Here’s how to prevent data loss by using Fluent Bit’s built-in flow control mechanisms: pause and resume.

Let’s say you’re shipping logs from a busy Kubernetes cluster to a Kafka topic. Your Fluent Bit pods are configured to collect logs from containerd and send them to Kafka.

apiVersion: fluentbit.fluentd.org/v1alpha2
kind: FluentBit
metadata:
  name: my-fluentbit
spec:
  config:
    filters:
      mem_buf_limit: "100MB"
    outputs:
      - name: kafka
        match: '*'
        host: kafka.example.com:9092
        topic: logs
        # ... other kafka options

The problem arises when Kafka experiences high load, perhaps due to a sudden surge in traffic or a downstream consumer failing. Fluent Bit, by default, tries to push data as fast as it can. If Kafka can’t acknowledge receipt fast enough, Fluent Bit’s internal buffers will fill up. Once these buffers are full, Fluent Bit will start dropping records to prevent itself from crashing or consuming all available memory. This is backpressure.

The mem_buf_limit in the filters section is crucial. It defines the maximum amount of memory Fluent Bit will use for buffering records before they are sent to an output. If this limit is hit, and the output can’t accept data, Fluent Bit will start dropping records. This isn’t a flow control mechanism in itself, but rather a safety net that, when hit, indicates a problem downstream.

To actively manage this, we introduce the flowcontrol.pause and flowcontrol.resume options within the output plugin.

First, set flowcontrol.pause to a value slightly below your mem_buf_limit. This tells Fluent Bit to start pausing the input when the buffer reaches this threshold, giving the output a chance to catch up before dropping data.

apiVersion: fluentbit.fluentd.org/v1alpha2
kind: FluentBit
metadata:
  name: my-fluentbit
spec:
  config:
    filters:
      mem_buf_limit: "100MB"
    outputs:
      - name: kafka
        match: '*'
        host: kafka.example.com:9092
        topic: logs
        flowcontrol.pause: "80MB" # Pause input when buffer reaches 80MB
        flowcontrol.resume: "60MB" # Resume input when buffer drops to 60MB
        # ... other kafka options

The flowcontrol.resume value should be set below the flowcontrol.pause value. This creates a hysteresis effect: Fluent Bit will resume sending data only after the buffer has significantly drained, preventing rapid on-off cycling (thrashing) that can be inefficient. The difference between pause and resume values is the buffer "drain" threshold.

When flowcontrol.pause is hit, Fluent Bit will stop reading new records from its input sources (like tail or kubernetes). It will then continue to attempt sending buffered records to the output (Kafka). If the output successfully acknowledges records and the buffer size drops below flowcontrol.resume, Fluent Bit will then resume reading new records from its inputs.

This mechanism ensures that Fluent Bit doesn’t overwhelm downstream systems. Instead of crashing or dropping data indiscriminately when the output is slow, it gracefully pauses ingestion, waits for the buffer to clear, and then resumes.

The key to tuning these values is understanding your typical ingestion rate, your downstream system’s processing capacity, and your acceptable latency. Start with flowcontrol.pause around 70-80% of mem_buf_limit and flowcontrol.resume about 10-20% lower. Monitor your Fluent Bit logs and Kafka consumer lag to adjust.

If you don’t set flowcontrol.pause and mem_buf_limit is reached, you will see messages in your Fluent Bit logs like [error] Buffer overflow, dropping record. After implementing flow control, you should see [info] Buffer reached pause threshold, pausing input. and later [info] Buffer drained below resume threshold, resuming input.

The next challenge you’ll face is managing distributed tracing context across log events when dealing with intermittent pauses in data flow.

Want structured learning?

Take the full Fluentbit course →