Fluentd itself doesn’t have a built-in mechanism to "parse" nested JSON fields in the way you might think; instead, you leverage its powerful filtering capabilities to restructure and extract data.

Here’s how a typical log record with nested JSON might look when it first arrives in Fluentd, assuming it’s already in JSON format:

{
  "message": "{\"user\": {\"id\": 123, \"name\": \"Alice\", \"address\": {\"street\": \"123 Main St\", \"city\": \"Anytown\"}}, \"action\": \"login\", \"timestamp\": \"2023-10-27T10:00:00Z\"}",
  "log_level": "info",
  "host": "server1.example.com"
}

Notice that the actual JSON content is often embedded within the message field. The json parser in Fluentd, by default, would treat this entire message string as a single field.

The real magic happens when you use the parser plugin, often in conjunction with the filter_record_transformer plugin, to unpack these nested structures.

Let’s say you have a configuration like this:

<source>
  @type tail
  path /var/log/app.log
  pos_file /var/log/app.log.pos
  tag app.logs
  <parse>
    @type json
  </parse>
</source>

<filter app.logs>
  @type parser
  key_name message
  reserve_data true
  <parse>
    @type json
  </parse>
</filter>

<filter app.logs>
  @type record_transformer
  enable_ruby true
  <record>
    user_id ${record['message']['user']['id']}
    user_name ${record['message']['user']['name']}
    user_street ${record['message']['message']['user']['address']['street']}
    user_city ${record['message']['message']['user']['address']['city']}
    action ${record['message']['action']}
  </record>
</filter>

<match app.logs>
  @type stdout
</match>

When Fluentd processes the log above with this configuration, the tail source with the json parser initially creates a record like this:

{
  "message": "{\"user\": {\"id\": 123, \"name\": \"Alice\", \"address\": {\"street\": \"123 Main St\", \"city\": \"Anytown\"}}, \"action\": \"login\", \"timestamp\": \"2023-10-27T10:00:00Z\"}",
  "log_level": "info",
  "host": "server1.example.com"
}

The first filter block, using @type parser with key_name message and an inner @type json, parses the string within the message field. If reserve_data true is set, it keeps the original message field and adds the parsed content as a new nested structure. The record now looks like:

{
  "message": "{\"user\": {\"id\": 123, \"name\": \"Alice\", \"address\": {\"street\": \"123 Main St\", \"city\": \"Anytown\"}}, \"action\": \"login\", \"timestamp\": \"2023-10-27T10:00:00Z\"}",
  "log_level": "info",
  "host": "server1.example.com",
  "user": {
    "id": 123,
    "name": "Alice",
    "address": {
      "street": "123 Main St",
      "city": "Anytown"
    }
  },
  "action": "login",
  "timestamp": "2023-10-27T10:00:00Z"
}

The second filter block, @type record_transformer with enable_ruby true, then uses Ruby expressions to pluck specific values from the newly parsed nested message field and promote them to top-level fields. The record_transformer plugin, when enable_ruby is true, treats record as a Ruby hash, allowing you to access nested elements using standard Ruby hash notation. The expressions like ${record['message']['user']['id']} navigate down through the parsed structure.

The final output, after the record_transformer, would be:

{
  "message": "{\"user\": {\"id\": 123, \"name\": \"Alice\", \"address\": {\"street\": \"123 Main St\", \"city\": \"Anytown\"}}, \"action\": \"login\", \"timestamp\": \"2023-10-27T10:00:00Z\"}",
  "log_level": "info",
  "host": "server1.example.com",
  "user_id": 123,
  "user_name": "Alice",
  "user_street": "123 Main St",
  "user_city": "Anytown",
  "action": "login"
}

The original message field is still there because reserve_data true was used in the parser filter, but the extracted fields are now top-level and easily queryable.

The most surprising thing about this process is that Fluentd doesn’t truly "parse" the nested JSON into its own internal structure; rather, it uses a parser to deserialize a string field into a new, nested hash structure that the subsequent record_transformer can then traverse and flatten. The message field itself isn’t magically understood; it’s treated as a string until the parser explicitly decodes it.

The actual mechanism you’re interacting with is the parser plugin’s ability to take a specified field (key_name) and apply another parsing strategy (like json) to its content, creating a new set of fields (often nested) from that content. The record_transformer then acts as a manipulator, allowing you to select specific paths within these nested hashes and promote them to the top level.

The critical insight is that the record_transformer plugin, when enable_ruby is true, provides a powerful, albeit sometimes verbose, way to navigate and manipulate complex data structures that have been unpacked by preceding parser filters. You can access any field that has been parsed or created up to that point in the filter chain.

The next logical step after flattening is to handle cases where the nested structure might be missing or inconsistent, which often involves more advanced Ruby expressions within the record_transformer or using conditional logic in your Fluentd configuration.

Want structured learning?

Take the full Fluentd course →