Fluent Bit can process logs faster by using coroutines to handle I/O operations concurrently.
Here’s Fluent Bit processing logs from a file, parsing them, and sending them to stdout:
# Fluent Bit configuration (fluentbit.conf)
[SERVICE]
Daemon Off
Log_Level Info
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/my_app.log
Tag myapp.log
[OUTPUT]
Name stdout
Match myapp.log
Format json
# Sample log file (/var/log/my_app.log)
{"timestamp": "2023-10-27T10:00:00Z", "level": "INFO", "message": "Application started"}
{"timestamp": "2023-10-27T10:00:01Z", "level": "DEBUG", "message": "Processing request ID 123"}
# Running Fluent Bit
fluent-bit -c fluentbit.conf
This setup allows Fluent Bit to read from /var/log/my_app.log, tag each record with myapp.log, parse it using the parsers.conf (which might define a JSON parser), and then print the formatted JSON output to the console. The "magic" here is that while Fluent Bit is reading one log line, it can also be parsing the previous one, and also be formatting the one before that, all without blocking on any single operation. This is achieved through coroutines.
The problem Fluent Bit solves is the I/O bound nature of log processing. Traditionally, an application reads a line, processes it, writes it, then moves to the next. If any of those steps involve waiting (like waiting for a network response from an output plugin), the entire pipeline grinds to a halt. Fluent Bit’s coroutine model, enabled by libraries like libco, allows it to switch context to another task whenever an I/O operation is pending. This means while one coroutine is waiting for a network write to complete, another can start reading the next log line, and yet another can be parsing a line it already read.
The primary lever you control for this parallel I/O is the Coro_Worker setting in the [SERVICE] section of your fluentbit.conf. By default, Fluent Bit might use a single worker thread with multiple coroutines. Increasing Coro_Worker dedicates more OS-level threads to running these coroutines, allowing for true parallelism across multiple CPU cores.
Consider this configuration:
[SERVICE]
Daemon Off
Log_Level Info
Parsers_File parsers.conf
Coro_Worker 4
Here, Coro_Worker 4 tells Fluent Bit to spin up 4 operating system threads. Each of these threads can independently schedule and run coroutines. If you have 4 CPU cores, this is often a sweet spot. If you have a very high volume of logs and many concurrent output destinations (e.g., multiple Kafka topics, Elasticsearch clusters, etc.), you might see benefits from setting this higher, perhaps to match the number of CPU cores available on your system. The underlying libco library manages the switching between coroutines within each thread, and the OS scheduler manages the switching between the threads themselves. This creates a highly efficient, non-blocking pipeline.
The surprising part is how many coroutines can be managed by a single thread. A single OS thread, running Fluent Bit’s event loop, can juggle hundreds or even thousands of coroutines concurrently. The Coro_Worker setting doesn’t increase the number of coroutines directly; it increases the number of threads that run those coroutines. This distinction is crucial: you’re not just creating more tasks, you’re giving more workers (threads) to execute those tasks in parallel. The memory overhead per coroutine is minimal (typically just a few kilobytes for the stack), making it far more efficient than traditional thread-per-request models.
When you increase Coro_Worker beyond the number of available CPU cores, you’ll likely see diminishing returns and potentially even performance degradation due to increased context switching overhead at the OS level. The optimal number is a balance between I/O concurrency needs and CPU availability.