Fluentd’s high availability isn’t just about having a backup; it’s about distributing the load and ensuring no single point of failure exists even under normal operation.

Let’s see this in action. Imagine you have two Fluentd aggregators, aggregator-1 and aggregator-2, both running on separate machines. Your data sources, like application servers, are configured to send logs to both.

Application Server Configuration (Example for app-server-1):

<match **>
  @type forward
  <server>
    host aggregator-1.yourdomain.com
    port 24224
    weight 1
  </server>
  <server>
    host aggregator-2.yourdomain.com
    port 24224
    weight 1
  </server>
</match>

In this setup, app-server-1 is simultaneously sending logs to both aggregators. If one aggregator goes down, the other continues to receive logs without interruption. The weight parameter allows you to influence how much traffic each aggregator receives, though for pure HA, equal weights are common.

The core problem Fluentd HA solves is preventing data loss and service interruption when a log aggregation component fails. Traditional active-passive setups require a failover mechanism, which introduces latency and complexity. Active-active, on the other hand, means both aggregators are actively processing data at all times.

Internally, Fluentd’s forward plugin is the workhorse here. When a client (like your application server) is configured with multiple <server> blocks, it maintains connections to all listed destinations. It attempts to send data to each server. If a connection fails or a server is unresponsive, the client will retry with a backoff strategy, but it will continue to try sending to the other available servers. The aggregators themselves, often configured with buffer plugins (like buffer_file or buffer_memory), temporarily store data before flushing it to downstream systems (databases, object storage, etc.). This buffering adds another layer of resilience.

The key levers you control are:

  • Client Configuration: How your log sources are configured to send data. Do they know about multiple destinations? Are they using a load balancer or directly listing multiple aggregators?
  • Aggregator Configuration: The listen directive on the aggregators and the port they expose.
  • Downstream Configuration: How the aggregators send data to your final destination. If this downstream is a single point of failure, your HA is only partial. You’ll likely want to configure the aggregators to send to a clustered database, a redundant object storage endpoint, or another HA aggregation layer.
  • Network: Ensuring that network paths between clients and aggregators, and between aggregators and their destinations, are reliable and redundant.

One aspect often overlooked in active-active setups is how downstream systems handle duplicate data. While Fluentd itself aims to send each log once to each configured destination, network glitches or specific retry behaviors might, in rare cases, lead to an aggregator attempting to re-send a batch. If your downstream storage (e.g., a database) doesn’t handle idempotency well, you could end up with duplicate log entries. This is why understanding your data’s journey all the way to its final resting place is critical.

The next challenge is ensuring your downstream storage can keep up with the combined throughput of your active-active aggregators.

Want structured learning?

Take the full Fluentd course →