Fluent Bit’s HTTP Event Collector (HEC) output plugin is surprisingly flexible, but its default configuration often leads to unparseable events in Splunk.

Let’s see Fluent Bit in action, shipping logs to Splunk HEC.

Here’s a minimal fluent-bit.conf:

[SERVICE]
    Flush        1
    Daemon       off
    Log_Level    info

[INPUT]
    Name        tail
    Path        /var/log/app.log
    Tag         my_app.logs

[OUTPUT]
    Name        splunk
    Match       my_app.logs
    Host        splunk.example.com
    Port        8088
    Splunk_Token your_splunk_hec_token
    Splunk_Format json
    Splunk_Mode http

This setup tails a local file, tags it, and sends it to a Splunk HEC endpoint. The Splunk_Format json tells Fluent Bit to serialize the log record into a JSON string.

The core problem Fluent Bit solves here is efficiently collecting, buffering, and routing logs from diverse sources to a centralized logging system like Splunk. It’s designed to be lightweight and resource-efficient, making it ideal for containerized environments or edge devices. Internally, it uses a plugin architecture. Input plugins gather data, filter plugins modify it, and output plugins send it to destinations. The splunk output plugin acts as a bridge, translating Fluent Bit’s internal FLB_LOG_RECORD structure into a format Splunk’s HEC understands.

When you set Splunk_Format json, Fluent Bit takes the entire log record—which includes metadata like the tag, timestamp, and any parsed fields—and serializes it as a single JSON string. Splunk’s HEC, by default, expects individual JSON objects for each event, where each object represents a distinct log entry with its own fields. If Splunk receives a single JSON string that contains multiple events, or if the structure isn’t what it expects, it often defaults to treating the entire string as the raw log message, leading to a single field like _raw with the entire JSON payload.

The Splunk_Format option is critical. If you set it to json, Fluent Bit sends the entire log record as a single JSON string value. For example, a log record like {"logtag": "my_app.logs", "time": "2023-10-27T10:00:00.000Z", "message": "User logged in"} might be sent as a single HEC event where _raw is "{\"logtag\": \"my_app.logs\", \"time\": \"2023-10-27T10:00:00.000Z\", \"message\": \"User logged in\"}". This isn’t usually what you want for structured searching in Splunk.

To get proper field extraction in Splunk, you typically want to send individual JSON objects per event. The splunk output plugin has a specific way to achieve this when Splunk_Format is set to json. It expects that your log record itself is the JSON payload you want to send. If your input plugin (like tail with a JSON parser) already extracts fields into the Fluent Bit record, you need to ensure the output plugin doesn’t re-serialize the entire record.

The key to getting structured data into Splunk is often to let Fluent Bit not wrap your already-structured log data. If your input is already parsing JSON, you might want to disable Fluent Bit’s default JSON serialization for the output. However, the splunk output plugin with Splunk_Format json does serialize the record. The most common mistake is assuming Splunk_Format json will automatically parse your JSON within the log message for Splunk. It doesn’t; it serializes the entire Fluent Bit record as a JSON string.

To achieve proper field extraction, you need to configure Fluent Bit’s input to parse your logs into structured fields, and then ensure the splunk output plugin sends these fields as individual key-value pairs within the HEC event. The Splunk_Format option is actually quite literal: if you set it to json, it serializes the entire Fluent Bit record (including tag, time, and all parsed fields) into a single JSON string value that becomes the _raw data in Splunk. This is often not what’s desired for structured searching.

When Splunk_Format is json, Fluent Bit sends the entire record as a JSON string. If your input is already producing JSON logs, and you don’t want Fluent Bit to re-serialize them, you might need a different approach or a custom filter. However, the standard way with the splunk output plugin is to let it serialize, and then use Splunk’s HEC parsing rules to break down that single JSON string.

The Splunk_Format option dictates how Fluent Bit structures the data it sends to HEC. Setting it to json means Fluent Bit takes the entire log record (tag, timestamp, and all parsed fields) and serializes it into a single JSON string. Splunk’s HEC then receives this as a single event, typically with the entire JSON string residing in the _raw field. To get individual fields like message, user_id, etc., extracted in Splunk, you usually need to configure Splunk’s HEC to parse this incoming JSON. This is often done by setting the correct sourcetype in Splunk that has JSON parsing enabled, or by specifying index=main and sourcetype=my_fluentbit_json in the HEC endpoint configuration, and then configuring my_fluentbit_json to auto-extract JSON.

The most surprising thing is that Splunk_Format json doesn’t mean "send JSON data from my logs"; it means "serialize the entire Fluent Bit record into a JSON string and send that."

Consider a scenario where your application logs JSON directly, like: {"timestamp": "2023-10-27T10:05:00Z", "level": "info", "message": "Processing request", "request_id": "abc123"}

If you use the tail input with a json parser, Fluent Bit will create a record like: { "time": "2023-10-27T10:05:00Z", "level": "info", "message": "Processing request", "request_id": "abc123" } (Note: Fluent Bit often adds its own time field if not present).

With Splunk_Format json in the output, Fluent Bit sends this entire record as a JSON string to HEC. In Splunk, you’ll see a single event with _raw containing that JSON. To get level, message, and request_id as separate fields, you must configure Splunk to parse this JSON.

Here’s the configuration for sending to Splunk HEC, with the json format:

Fluent Bit Configuration (fluent-bit.conf)

[SERVICE]
    Flush        1
    Daemon       off
    Log_Level    info

[INPUT]
    Name        tail
    Path        /var/log/my_app_logs.json
    Tag         my_app.json
    Parser      json # Crucial for parsing JSON logs from the input

[OUTPUT]
    Name        splunk
    Match       my_app.json
    Host        your-splunk-hec.example.com
    Port        8088
    Splunk_Token YOUR_HEC_TOKEN_HERE
    Splunk_Format json # Tells Fluent Bit to serialize the record as JSON
    Splunk_Mode http
    # Splunk_Index main  # Optional: Specify Splunk index
    # Splunk_Sourcetype my_fluentbit_json # Optional: Specify sourcetype

Splunk HEC Setup

  1. Create a New HEC Token: In Splunk, go to Settings -> Data Inputs -> HTTP Event Collector. Create a new token (e.g., my_fluentbit_token).
  2. Configure Input Settings:
    • Source type: Choose _json (built-in JSON parsing) or create a custom sourcetype (e.g., my_fluentbit_json) and set its parsing to JSON.
    • Index: Select the index where you want logs to go (e.g., main).
  3. Enable the HEC: Ensure HTTP Event Collector is enabled.

How it Works Internally

When Fluent Bit processes a log record that has been parsed into fields (thanks to the Parser json in the input), and the output is set to Splunk_Format json, it constructs a JSON object representing the entire Fluent Bit record. This includes the tag, the time (often converted to ISO 8601 format), and all fields extracted from your original log message.

For example, if your input log was {"timestamp": "2023-10-27T10:05:00Z", "level": "info", "message": "Processing request", "request_id": "abc123"} and the tail input uses the json parser, Fluent Bit might create an internal record like: { "time": "2023-10-27T10:05:00Z", "level": "info", "message": "Processing request", "request_id": "abc123" }

The Splunk_Format json output plugin then takes this internal record and serializes it into a single JSON string: {"time":"2023-10-27T10:05:00Z","level":"info","message":"Processing request","request_id":"abc123"}

This string is then sent as the value of the event parameter in the HEC POST request to Splunk.

In Splunk, when the HEC receives this, if the sourcetype is configured for JSON, Splunk will automatically parse this string, extracting time, level, message, and request_id as distinct fields.

The one thing most people don’t realize is that Splunk_Format json is about serializing the Fluent Bit record itself, not about passing through pre-formatted JSON messages from the application without modification. If your application logs raw text, and you want to parse that text into fields in Splunk, you’d typically use a different Fluent Bit parser (e.g., regex, logfmt) to extract fields, and then Splunk_Format json would serialize those extracted fields into a JSON string for Splunk to parse.

The next challenge is handling multi-line log events or complex nested JSON structures within your application logs.

Want structured learning?

Take the full Fluentbit course →