Fluentd’s record_transformer plugin is a surprisingly powerful way to reshape your logs before they even hit storage, not just after.

Let’s see it in action. Imagine you’re collecting logs from multiple services, and you want to add a consistent service_name field and perhaps extract a timestamp from a custom field.

Here’s a snippet from a Fluentd configuration:

<source>
  @type forward
  port 24224
  <transport tls>
    cert_path /etc/fluent/certs/server.crt
    private_key_path /etc/fluent/certs/server.key
  </transport>
</source>

<filter **>
  @type record_transformer
  enable_ruby true
  <record>
    service_name "my_app"
    # Extract timestamp from a custom field
    extracted_timestamp ${time.strftime('%Y-%m-%dT%H:%M:%S%z')}
  </record>
  <renew_time_key>
    key timestamp
    format %Y-%m-%dT%H:%M:%S%z
  </renew_time_key>
</filter>

<match **>
  @type stdout
</match>

When a log record comes in, say:

{
  "message": "User logged in",
  "timestamp": "2023-10-27T10:30:00+0000",
  "user_id": 123
}

Fluentd, after processing the <filter **> block with record_transformer, will output this to stdout:

{
  "service_name": "my_app",
  "extracted_timestamp": "2023-10-27T10:30:00+0000",
  "message": "User logged in",
  "timestamp": "2023-10-27T10:30:00+0000",
  "user_id": 123
}

Notice how service_name and extracted_timestamp are added. The renew_time_key directive is particularly neat: it takes the value from the timestamp field (which was already there) and uses it to set Fluentd’s internal event time. This is crucial for ensuring your logs are correctly ordered and timestamped according to their actual occurrence, not just when Fluentd received them.

The problem this solves is the heterogeneity of log data. Different applications emit logs in different formats, with different field names for timestamps or identifiers. record_transformer acts as a universal translator, normalizing these fields before they get indexed or stored, making analysis and querying infinitely easier.

Internally, record_transformer operates by inspecting each incoming record. The <record> block defines static fields to add or overwrite. For dynamic fields, you can use Ruby expressions within ${}. The enable_ruby true directive is key here. The renew_time_key block is a specialized sub-plugin that tells Fluentd to use a specific field’s value to set the record’s internal @timestamp, converting it if necessary based on the format provided. You can specify multiple renew_time_key blocks if you have alternative timestamp fields to check.

The core levers you control are:

  • <record> block: For adding static key-value pairs or dynamically generated values using Ruby.
  • enable_ruby true: Essential for using Ruby expressions within <record>.
  • renew_time_key: To synchronize Fluentd’s event time with a field in your log record. You specify the key to read from and the format it’s in.
  • remove_keys: To strip out fields you don’t want to carry forward.
  • tag_prefix / tag_suffix / tag_transformer: To manipulate the record’s tag based on its content.

A common pitfall is assuming the Ruby expressions within <record> have access to the full Fluentd API or context. They operate on the individual record and the time object representing the current event time. Also, timestamp parsing can be tricky; always ensure the format in renew_time_key precisely matches your input data’s timestamp string. If you need to parse multiple potential timestamp fields, you might need to chain record_transformer filters or use a more sophisticated plugin like parser_json with time_key and time_format if you’re parsing JSON from a single field.

The most surprising thing is how record_transformer can be used to dynamically rewrite the tag of a log record based on its content, allowing you to route logs to different outputs based on field values without complex routing logic in your output plugins.

If you find yourself needing to conditionally add fields based on complex logic that goes beyond simple Ruby expressions, you might explore using the filter_ruby plugin for more intricate record manipulation.

Want structured learning?

Take the full Fluentd course →