Fluentd’s timestamp normalization is a lot less about making timestamps look pretty and more about preventing you from losing data silently.

Let’s see it in action. Imagine you’ve got logs coming in from different services, some with timestamps like 2023-10-27T10:30:00Z and others with 1698393000 (a Unix epoch timestamp). If you just dump these into Elasticsearch, Fluentd will happily pick the first timestamp it sees in a record as the record’s timestamp, regardless of what it means. This means your chronologically ordered logs can end up scattered across your time-series database.

Here’s a typical setup where you might want to normalize:

<source>
  @type forward
  port 24224
  <transport tcp>
    keep_alive true
  </transport>
</source>

<filter **>
  @type parser
  key_name log # Assuming your log message is in a field named 'log'
  <parse>
    @type json
    time_key timestamp # The field in your JSON log containing the timestamp
    time_format %Y-%m-%dT%H:%M:%S%z # Or whatever format your timestamp is in
  </parse>
</filter>

<filter **>
  @type record_transformer
  enable_ruby true
  <record>
    # If the original timestamp was an epoch, convert it to ISO8601
    # This assumes the epoch is in seconds. If it's milliseconds, use time_in_ms = record["timestamp"].to_i
    # If you don't have a timestamp field, you can use Fluentd's current time:
    # time_in_ms ${Time.now.to_i * 1000}
    # The 'time' field is special in Fluentd; it's the record's timestamp.
    # Setting it here overrides the one parsed by the parser plugin.
    time_in_ms ${record["timestamp"].to_i * 1000}
  </record>
</filter>

<match **>
  @type stdout
</match>

In this configuration:

  1. Source: We’re receiving logs via the forward protocol.
  2. Parser Filter: The first filter attempts to parse a log field. Crucially, time_key timestamp tells it to look for a field named timestamp within the parsed log, and time_format tells it how to interpret that field’s value. If successful, Fluentd will use this parsed timestamp as the event’s time.
  3. Record Transformer Filter: This is where the normalization often happens. The enable_ruby true allows us to use Ruby code. time_in_ms ${record["timestamp"].to_i * 1000} takes the parsed timestamp field (which might be an ISO8601 string or an epoch number) and converts it into milliseconds since the epoch. Fluentd’s internal timestamp is typically stored as milliseconds since the epoch. By setting time_in_ms here, we’re explicitly telling Fluentd what the event’s timestamp should be. This is powerful because it allows you to override or explicitly set the timestamp, ensuring consistency. If the timestamp field in your logs was already a Unix epoch in seconds, .to_i converts it to an integer, and * 1000 converts it to milliseconds. If your logs had ISO8601 timestamps, the parser plugin would have already converted them to a Ruby Time object, and .to_i * 1000 would correctly get milliseconds.
  4. Match: We’re sending it to stdout for demonstration.

The problem this solves is that without explicit normalization, Fluentd defaults to using Time.now for an event’s timestamp if it can’t find or parse a suitable timestamp from the incoming record. This means out-of-order events might get a "current" timestamp, making your historical data appear to have arrived late, or worse, a malformed timestamp might be ignored entirely, leading to data loss. By using the parser plugin with time_key and time_format, you can tell Fluentd how to extract and interpret timestamps from your various log formats. Then, the record_transformer can ensure these extracted timestamps are consistently represented (e.g., as milliseconds since the epoch) before they are emitted, guaranteeing that the timestamp Fluentd associates with the record accurately reflects the event’s actual occurrence time.

The most surprising thing about Fluentd’s timestamp handling is that it doesn’t have a global, mandatory "timestamp field" setting; it relies on plugins to discover and parse timestamps from diverse log formats, and then uses the first one it successfully parses as the event’s timestamp. If it can’t parse any, it falls back to Time.now, which is often not what you want for historical log analysis.

If you’re dealing with timestamps that are in a timezone other than UTC and you haven’t specified that timezone during parsing, Fluentd will interpret them as UTC. This can lead to subtle but significant shifts in your timestamps when you later query your data, especially if your logs originate from systems in different geographical locations.

The next thing you’ll likely encounter is handling timezone-aware timestamps when your source logs aren’t consistently UTC.

Want structured learning?

Take the full Fluentd course →