Fluentd doesn’t just collect logs; it can actually become a bottleneck if you’re not careful about how you deploy it.

Let’s see Fluentd in action. Imagine you have a web server spitting out access logs.

{"remote":"192.168.1.100","user":"-","method":"GET","path":"/index.html","code":200,"size":1234,"referer":"-","agent":"Mozilla/5.0"}

In a single-node setup, a single Fluentd instance on your web server would read this log, maybe parse it, and then ship it directly to a destination like Elasticsearch.

# fluent.conf
<source>
  @type tail
  path /var/log/apache2/access.log
  pos_file /var/log/td-agent/apache2.log.pos
  tag apache.access
  <parse>
    @type apache2
  </parse>
</source>

<match apache.access>
  @type stdout
</match>

This stdout output would look something like:

2023-10-27 10:00:00.123456 +0000 apache.access: {"remote":"192.168.1.100","user":"-","method":"GET","path":"/index.html","code":200,"size":1234,"referer":"-","agent":"Mozilla/5.0"}

This is simple and works great when you have only a few log sources.

The core problem Fluentd solves is normalizing and routing disparate log formats from many sources into a consistent stream for analysis or storage. It acts as a universal adapter.

In an aggregator topology, you have multiple "forwarder" Fluentd instances running on your application servers. These forwarders don’t store logs long-term; they just collect, maybe do some light filtering or parsing, and then send them over the network to a central "aggregator" Fluentd instance. The aggregator then processes and sends these logs to their final destinations.

Consider this configuration for a forwarder:

# fluentd-forwarder.conf
<source>
  @type tail
  path /var/log/myapp/application.log
  pos_file /var/log/td-agent/myapp.log.pos
  tag myapp.logs
  <parse>
    @type json
  </parse>
</source>

<match myapp.logs>
  @type forward
  <server>
    host aggregator.example.com
    port 24224
  </server>
</match>

And on the aggregator:

# fluentd-aggregator.conf
<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>

<match **>
  @type stdout
</match>

Here, the forward plugin in the forwarder sends data over TCP to the forward input plugin on the aggregator. The tag ** on the aggregator matches any tag received, ensuring all logs from all forwarders are processed.

The single-node topology is best for small deployments where the overhead of network hops and managing multiple Fluentd processes outweighs the benefits. Think development environments, a single critical server, or a handful of microservices where each has its own Fluentd sending directly to a single backend. The key here is simplicity and minimal latency for direct-to-destination shipping.

The aggregator topology shines when you have many log sources, potentially across different machines or containers. It centralizes the heavy lifting of buffering, retries, and complex routing logic on dedicated aggregator nodes, preventing your application servers from being bogged down by log shipping. It also allows for more sophisticated processing (like enrichment or filtering) to happen in one place, rather than on every single application instance.

The most surprising thing is how much state Fluentd’s forward plugin maintains. When a forwarder sends data to an aggregator, it doesn’t just fire and forget. It establishes a persistent TCP connection and expects acknowledgments. If the aggregator is slow or the network is flaky, the forwarder buffers data in memory and on disk (if configured), and it will aggressively retry, potentially consuming significant resources on the application server if the aggregator is unreachable for an extended period. This is why the retry_max_times and retry_wait parameters in the forward output plugin are critical for tuning.

When you move from an aggregator topology to a more distributed one with multiple aggregators, you’ll start thinking about how to load balance the incoming traffic to those aggregators and ensure data doesn’t get lost if an aggregator node fails.

Want structured learning?

Take the full Fluentd course →