Rate-Limit Fluentd Output with the Throttle Plugin (2026)

The throttle plugin for Fluentd doesn’t actually limit the rate at which Fluentd processes logs; it limits the rate at which logs are sent to a downstream destination.

Let’s see it in action. Imagine you have a simple Fluentd configuration that tails a log file and forwards it to another Fluentd instance (a common scenario for buffering or aggregation).

# In your source Fluentd instance
<source>
  @type tail
  path /var/log/my_app.log
  pos_file /var/log/td-agent/my_app.log.pos
  tag myapp.log
  <parse>
    @type json
  </parse>
</source>

<match myapp.log>
  @type stdout
  # Without throttle, this would send logs as fast as possible
</match>

Now, let’s say you want to prevent this source Fluentd from overwhelming the destination Fluentd (or an upstream API, or a database). You’d insert the throttle plugin into the <match> block:

<source>
  @type tail
  path /var/log/my_app.log
  pos_file /var/log/td-agent/my_app.log.pos
  tag myapp.log
  <parse>
    @type json
  </parse>
</source>

<match myapp.log>
  @type throttle
  # Allow a maximum of 100 events per second
  rate 100
  # Burst allowance: how many events can be sent *over* the rate limit in a short period
  # This is crucial for handling natural spikes without dropping too much.
  # Here, we allow up to 200 events in a burst.
  window 1s
  <sink>
    @type stdout
    # The actual output plugin is nested inside <sink>
  </sink>
</match>

When Fluentd processes events tagged myapp.log, they first hit the throttle plugin. The throttle plugin then checks if sending the current batch of events would exceed the configured rate (100 events per second) within the window (1 second). If it would, the plugin holds onto those events and waits for the next available time slot, up to the burst limit. Once the burst is exhausted, events exceeding the rate will be dropped or queued according to the overflow_action (which defaults to drop).

The throttle plugin essentially implements a token bucket algorithm. The rate defines how many tokens are replenished per window. When an event arrives, it consumes a token. If there are no tokens, the event is either delayed or dropped. The window setting is important: a smaller window with a high rate can feel more "bursty" than a larger window with the same rate. A window 1s with rate 100 means the bucket refills 100 tokens every second. A window 10s with rate 1000 would refill 1000 tokens every 10 seconds, effectively the same average rate but with a different burst characteristic.

The most surprising thing about the throttle plugin is that its primary purpose is often misunderstood. It’s not about preventing Fluentd itself from becoming a bottleneck by processing too many logs, but rather about protecting downstream systems that have their own ingestion limits. If your Fluentd instance is struggling to keep up with log volume, you’d typically look at increasing Fluentd’s worker processes, tuning the input/output plugins, or using buffer chunk management, not the throttle plugin. The throttle plugin assumes Fluentd can process the logs, but the destination cannot.

The overflow_action parameter is key here. By default, it’s drop, meaning any events that can’t be sent within the rate limit are discarded. If you set overflow_action to wait, Fluentd will block all processing for that tag until it can send the event. This can lead to significant back pressure and potentially block your entire Fluentd pipeline, which is usually undesirable unless you have a very specific, controlled scenario.

The next concept you’ll likely encounter after mastering rate limiting is how to handle the dropped events, leading you to explore strategies like using a robust buffer plugin with retry mechanisms or implementing dead-letter queues.