Fluentd is acting as a pipeline, taking logs from various sources and sending them to Kafka.
Here’s what Fluentd looks like in action, acting as a Kafka producer:
# fluentd.conf
<source>
@type tail
path /var/log/app/*.log
pos_file /var/log/td-agent/app.log.pos
tag app.logs
<parse>
@type json
</parse>
</source>
<match app.logs>
@type kafka2
brokers kafka-broker-1:9092,kafka-broker-2:9092
default_topic fluentd-logs
<format>
@type json
</format>
<buffer>
@type file
path /var/log/td-agent/buffer/app.logs
flush_interval 10s
retry_max_times 5
retry_wait 1s
</buffer>
</match>
This configuration does a few key things:
- Source (
tail): It tells Fluentd to watch the/var/log/app/*.logfiles for new log entries. It assumes these logs are in JSON format and assigns the tagapp.logsto them. Thepos_fileis crucial for ensuring that Fluentd doesn’t re-read logs it has already processed after a restart. - Match (
kafka2): This is where the magic happens. When Fluentd sees an event taggedapp.logs, it routes it to thekafka2output plugin.brokers: This is a comma-separated list of your Kafka broker addresses.default_topic: This is the Kafka topic where the logs will be sent. If an incoming event has a specific topic defined, that will be used instead.format json: This ensures the log events are serialized into JSON format before being sent to Kafka, which is a common and flexible choice.buffer: Thebuffersection is vital for reliability. Here, we’re using a file-based buffer. If Fluentd can’t immediately send logs to Kafka (e.g., network issues, Kafka is down), it will store them in/var/log/td-agent/buffer/app.logs.flush_interval 10s: Fluentd will attempt to send buffered logs to Kafka every 10 seconds.retry_max_times 5andretry_wait 1s: If sending fails, it will retry up to 5 times, waiting 1 second between retries.
The core problem this solves is bridging the gap between applications generating logs and a robust, scalable log aggregation system like Kafka. Instead of applications directly writing to Kafka (which adds complexity to their code and requires them to handle retries, serialization, etc.), Fluentd acts as a dedicated, optimized log shipper. It can handle diverse log formats, buffer reliably, and send to Kafka efficiently.
What most people don’t realize is how much the buffer configuration impacts performance and reliability. A small buffer or aggressive flush interval might seem good for low latency, but it can quickly overwhelm Kafka or lead to data loss if there are transient network issues. Conversely, a massive buffer can consume significant disk space and lead to high latency when Kafka recovers, as Fluentd tries to "catch up." The retry_wait and retry_max_times also play a critical role; setting them too low means you might give up on sending a log event too soon, while setting them too high can cause Fluentd to stall for extended periods if Kafka is truly unavailable.
The next logical step is often to explore how to add message ordering guarantees or handle different Kafka partitioning strategies within Fluentd.