Fluent Bit’s parsers are the secret sauce for turning a chaotic log stream into something actionable, and the regex, JSON, and logfmt parsers are your primary tools.
Let’s see how Fluent Bit can take raw, messy log lines and make them structured. Imagine we have a simple web server spitting out logs like this:
2023-10-27T10:00:01Z INFO user_id=123 request="/api/v1/users" method="GET" status=200 response_time_ms=45
2023-10-27T10:00:05Z ERROR user_id=456 request="/admin" method="POST" error_message="permission denied"
2023-10-27T10:00:10Z INFO user_id=789 request="/health" method="GET" status=200
This is great for humans to read, but terrible for machines to query. We want to be able to easily filter by user_id, status, or method.
Here’s a Fluent Bit configuration snippet that tackles this:
[SERVICE]
Flush 5
Daemon off
Log_Level info
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/my_webserver.log
Tag webserver.*
Parser my_webserver_parser
[FILTER]
Name parser
Match webserver.*
Key_Name log
Parser my_webserver_parser
Reserve_Data true
[OUTPUT]
Name stdout
Match webserver.*
Format json
And in our parsers.conf file:
[PARSER]
Name my_webserver_parser
Format regex
Regex ^(?<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z)\s+(?<level>\w+)\s+(?<message>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%SZ
[PARSER]
Name my_kv_parser
Format kv
Delimiter " "
Key_Value_Separator "="
Trim_Key ""
Trim_Value ""
Allow_Quoted_Values true
[FILTER]
Name lua
Match webserver.*
script lua/parse_message.lua
And the lua/parse_message.lua script:
function parse_log_message(tag, timestamp, record)
local message = record['message']
if message then
record = fluent.parse(record, 'my_kv_parser', message)
end
return 1, record
end
When Fluent Bit processes our log file with this configuration, the tail input reads the lines. The my_webserver_parser (using regex) then extracts the timestamp, log level, and the raw message. The FILTER section then takes that extracted message field and applies my_kv_parser (using logfmt/key-value parsing) to break down the key-value pairs. The Lua filter orchestrates this second parsing step. The final output to stdout in JSON format will look like this:
{
"time": "2023-10-27T10:00:01Z",
"level": "INFO",
"user_id": "123",
"request": "/api/v1/users",
"method": "GET",
"status": "200",
"response_time_ms": "45",
"message": "user_id=123 request=\"/api/v1/users\" method=\"GET\" status=200 response_time_ms=45"
}
{
"time": "2023-10-27T10:00:05Z",
"level": "ERROR",
"user_id": "456",
"request": "/admin",
"method": "POST",
"error_message": "permission denied",
"message": "user_id=456 request=\"/admin\" method=\"POST\" error_message=\"permission denied\""
}
{
"time": "2023-10-27T10:00:10Z",
"level": "INFO",
"user_id": "789",
"request": "/health",
"method": "GET",
"status": "200",
"message": "user_id=789 request=\"/health\" method=\"GET\" status=200"
}
Notice how the original message is now broken down into individual fields, making it trivial to query: WHERE status = 200 or WHERE user_id = '123'.
The power here is chaining parsers. The regex parser is excellent for the overall structure of a log line, but often the payload of that line is itself structured, like key-value pairs or JSON. The kv parser (which is what logfmt boils down to in Fluent Bit) is perfect for that. You can even use the json parser if your logs embed JSON objects within the message.
The Key_Name and Parser directives in the FILTER section are crucial. Key_Name specifies which field in the incoming record contains the data to be parsed. If you don’t specify Key_Name, Fluent Bit defaults to parsing the entire record. Parser then points to the name of the parser defined in parsers.conf that you want to apply. Reserve_Data true is handy because it keeps the original, unparsed message field alongside the newly parsed fields, which can be useful for debugging or if you need to refer back to the raw input.
The regex parser’s Time_Key and Time_Format are vital for ensuring your timestamps are correctly parsed and can be used for time-series analysis. Fluent Bit supports a wide range of strptime format specifiers.
One subtle but powerful aspect is how the kv parser handles quoted values. When you have values containing spaces or special characters, enclosing them in double quotes ("permission denied") allows the kv parser to correctly capture the entire string as a single value, preventing it from being split by the delimiter.
The next step is often to route these structured logs to different backends based on their content, perhaps sending ERROR level logs to a more urgent alerting system.