Fluent Bit’s parsers are the secret sauce for turning a chaotic log stream into something actionable, and the regex, JSON, and logfmt parsers are your primary tools.

Let’s see how Fluent Bit can take raw, messy log lines and make them structured. Imagine we have a simple web server spitting out logs like this:

2023-10-27T10:00:01Z INFO user_id=123 request="/api/v1/users" method="GET" status=200 response_time_ms=45
2023-10-27T10:00:05Z ERROR user_id=456 request="/admin" method="POST" error_message="permission denied"
2023-10-27T10:00:10Z INFO user_id=789 request="/health" method="GET" status=200

This is great for humans to read, but terrible for machines to query. We want to be able to easily filter by user_id, status, or method.

Here’s a Fluent Bit configuration snippet that tackles this:

[SERVICE]
    Flush        5
    Daemon       off
    Log_Level    info
    Parsers_File parsers.conf

[INPUT]
    Name        tail
    Path        /var/log/my_webserver.log
    Tag         webserver.*
    Parser      my_webserver_parser

[FILTER]
    Name        parser
    Match       webserver.*
    Key_Name    log
    Parser      my_webserver_parser
    Reserve_Data    true

[OUTPUT]
    Name        stdout
    Match       webserver.*
    Format      json

And in our parsers.conf file:

[PARSER]
    Name        my_webserver_parser
    Format      regex
    Regex       ^(?<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z)\s+(?<level>\w+)\s+(?<message>.*)$
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%SZ

[PARSER]
    Name        my_kv_parser
    Format      kv
    Delimiter   " "
    Key_Value_Separator "="
    Trim_Key    ""
    Trim_Value  ""
    Allow_Quoted_Values true

[FILTER]
    Name        lua
    Match       webserver.*
    script      lua/parse_message.lua

And the lua/parse_message.lua script:

function parse_log_message(tag, timestamp, record)
    local message = record['message']
    if message then
        record = fluent.parse(record, 'my_kv_parser', message)
    end
    return 1, record
end

When Fluent Bit processes our log file with this configuration, the tail input reads the lines. The my_webserver_parser (using regex) then extracts the timestamp, log level, and the raw message. The FILTER section then takes that extracted message field and applies my_kv_parser (using logfmt/key-value parsing) to break down the key-value pairs. The Lua filter orchestrates this second parsing step. The final output to stdout in JSON format will look like this:

{
  "time": "2023-10-27T10:00:01Z",
  "level": "INFO",
  "user_id": "123",
  "request": "/api/v1/users",
  "method": "GET",
  "status": "200",
  "response_time_ms": "45",
  "message": "user_id=123 request=\"/api/v1/users\" method=\"GET\" status=200 response_time_ms=45"
}
{
  "time": "2023-10-27T10:00:05Z",
  "level": "ERROR",
  "user_id": "456",
  "request": "/admin",
  "method": "POST",
  "error_message": "permission denied",
  "message": "user_id=456 request=\"/admin\" method=\"POST\" error_message=\"permission denied\""
}
{
  "time": "2023-10-27T10:00:10Z",
  "level": "INFO",
  "user_id": "789",
  "request": "/health",
  "method": "GET",
  "status": "200",
  "message": "user_id=789 request=\"/health\" method=\"GET\" status=200"
}

Notice how the original message is now broken down into individual fields, making it trivial to query: WHERE status = 200 or WHERE user_id = '123'.

The power here is chaining parsers. The regex parser is excellent for the overall structure of a log line, but often the payload of that line is itself structured, like key-value pairs or JSON. The kv parser (which is what logfmt boils down to in Fluent Bit) is perfect for that. You can even use the json parser if your logs embed JSON objects within the message.

The Key_Name and Parser directives in the FILTER section are crucial. Key_Name specifies which field in the incoming record contains the data to be parsed. If you don’t specify Key_Name, Fluent Bit defaults to parsing the entire record. Parser then points to the name of the parser defined in parsers.conf that you want to apply. Reserve_Data true is handy because it keeps the original, unparsed message field alongside the newly parsed fields, which can be useful for debugging or if you need to refer back to the raw input.

The regex parser’s Time_Key and Time_Format are vital for ensuring your timestamps are correctly parsed and can be used for time-series analysis. Fluent Bit supports a wide range of strptime format specifiers.

One subtle but powerful aspect is how the kv parser handles quoted values. When you have values containing spaces or special characters, enclosing them in double quotes ("permission denied") allows the kv parser to correctly capture the entire string as a single value, preventing it from being split by the delimiter.

The next step is often to route these structured logs to different backends based on their content, perhaps sending ERROR level logs to a more urgent alerting system.

Want structured learning?

Take the full Fluentbit course →