Fluent Bit’s Kafka output plugin doesn’t actually produce logs to Kafka; it acts as a Kafka producer, forwarding log records it receives from its input plugins to Kafka topics.
Here’s Fluent Bit shipping logs to Kafka, using the tail input plugin to read from a file and the kafka output plugin to send them:
[SERVICE]
Flush 1
Daemon off
Log_Level info
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/myapp.log
Tag myapp.logs
Parser json
[OUTPUT]
Name kafka
Match myapp.logs
Brokers kafka-broker-1:9092,kafka-broker-2:9092
Topics myapp-topic
Key event_id
Async true
This configuration tells Fluent Bit to:
- Flush buffered records every second (
Flush 1). - Run in the foreground (
Daemon off). - Log at the
infolevel. - Use the
parsers.conffile for parsing log lines.
The [INPUT] section defines a tail input plugin that monitors /var/log/myapp.log. It tags these incoming logs with myapp.logs and expects them to be in JSON format, as specified by the Parser json directive.
The [OUTPUT] section uses the kafka output plugin.
- It only processes records tagged with
myapp.logs(Match myapp.logs). - It connects to Kafka brokers at
kafka-broker-1:9092andkafka-broker-2:9092. - It sends records to the
myapp-topic. - It uses the
event_idfield from the log record as the Kafka message key. - It sends messages asynchronously (
Async true) for higher throughput.
When Fluent Bit starts, it will tail /var/log/myapp.log. Each time a new line appears, it will parse it as JSON. If the parsed record’s tag matches myapp.logs, it will be sent to the kafka output plugin. The plugin will then format it as a Kafka message with event_id as the key and produce it to the myapp-topic on the specified brokers.
The most surprising thing about Fluent Bit’s Kafka output is that it doesn’t require a separate Kafka client library to be installed on the host; the necessary Kafka producer logic is built directly into the plugin itself, leveraging librdkafka.
Consider a scenario where you have application logs in /var/log/app/service.log and you want to send them to a Kafka topic named app-logs on kafka-01:9092.
[INPUT]
Name tail
Path /var/log/app/service.log
Tag app.service
Parser docker # Assuming logs are Docker formatted JSON
[OUTPUT]
Name kafka
Match app.service
Brokers kafka-01:9092
Topics app-logs
# No Key specified, Kafka will use a random key or round-robin partitioning
# Async is true by default for Kafka output
This configuration sets up another input, this time for Docker-formatted JSON logs in /var/log/app/service.log, tagged as app.service. These logs are then directed to the app-logs Kafka topic. Since Key is not specified, Kafka will handle partitioning automatically.
The Key parameter in the Kafka output plugin is powerful because it allows you to control partitioning. If you specify a field that is common across related log events (like a user ID or a request ID), all logs for that specific key will land on the same Kafka partition. This is crucial for ordered processing of related events downstream.
If your Kafka brokers use TLS encryption, you’ll need to configure TLS settings in the output plugin. For example, to use TLS with a CA certificate at /etc/ssl/certs/ca.crt:
[OUTPUT]
Name kafka
Match myapp.logs
Brokers kafka-tls-broker:9092
Topics myapp-topic-tls
Key event_id
Tls On
Tls_Ca_File /etc/ssl/certs/ca.crt
The Tls On directive enables TLS, and Tls_Ca_File points to the certificate authority file for verifying the broker’s identity.
The acks configuration option controls how many acknowledgments Kafka requires from brokers before considering a produce request successful. Setting acks 0 means Fluent Bit won’t wait for any acknowledgment, offering the highest throughput but with the risk of data loss if a broker fails immediately after receiving the message. acks 1 (the default) waits for an acknowledgment from the leader broker, providing a good balance. acks all waits for acknowledgment from all in-sync replicas, ensuring maximum durability but at the cost of higher latency.
The Buffer_Queue_Size and Buffer_Max_Events settings in the [SERVICE] section control how many records Fluent Bit buffers in memory before flushing to the output plugin. For high-volume scenarios, increasing these can improve throughput by reducing the frequency of writes to Kafka, but also increases memory usage and the potential for data loss if Fluent Bit crashes before flushing.
The id field in the parsed JSON log record is used as the Kafka message key. This is useful for ensuring that all log messages related to a specific user session (identified by id) are processed in order by downstream consumers that partition based on the message key.
The next logical step after reliably sending logs to Kafka is to consume and process them, which often involves setting up Kafka consumers or using stream processing frameworks like Apache Flink or Spark Streaming.