Fluentd’s timestamp normalization is a lot less about making timestamps look pretty and more about preventing you from losing data silently.
Let’s see it in action. Imagine you’ve got logs coming in from different services, some with timestamps like 2023-10-27T10:30:00Z and others with 1698393000 (a Unix epoch timestamp). If you just dump these into Elasticsearch, Fluentd will happily pick the first timestamp it sees in a record as the record’s timestamp, regardless of what it means. This means your chronologically ordered logs can end up scattered across your time-series database.
Here’s a typical setup where you might want to normalize:
<source>
@type forward
port 24224
<transport tcp>
keep_alive true
</transport>
</source>
<filter **>
@type parser
key_name log # Assuming your log message is in a field named 'log'
<parse>
@type json
time_key timestamp # The field in your JSON log containing the timestamp
time_format %Y-%m-%dT%H:%M:%S%z # Or whatever format your timestamp is in
</parse>
</filter>
<filter **>
@type record_transformer
enable_ruby true
<record>
# If the original timestamp was an epoch, convert it to ISO8601
# This assumes the epoch is in seconds. If it's milliseconds, use time_in_ms = record["timestamp"].to_i
# If you don't have a timestamp field, you can use Fluentd's current time:
# time_in_ms ${Time.now.to_i * 1000}
# The 'time' field is special in Fluentd; it's the record's timestamp.
# Setting it here overrides the one parsed by the parser plugin.
time_in_ms ${record["timestamp"].to_i * 1000}
</record>
</filter>
<match **>
@type stdout
</match>
In this configuration:
- Source: We’re receiving logs via the
forwardprotocol. - Parser Filter: The first filter attempts to parse a
logfield. Crucially,time_key timestamptells it to look for a field namedtimestampwithin the parsed log, andtime_formattells it how to interpret that field’s value. If successful, Fluentd will use this parsed timestamp as the event’s time. - Record Transformer Filter: This is where the normalization often happens. The
enable_ruby trueallows us to use Ruby code.time_in_ms ${record["timestamp"].to_i * 1000}takes the parsedtimestampfield (which might be an ISO8601 string or an epoch number) and converts it into milliseconds since the epoch. Fluentd’s internal timestamp is typically stored as milliseconds since the epoch. By settingtime_in_mshere, we’re explicitly telling Fluentd what the event’s timestamp should be. This is powerful because it allows you to override or explicitly set the timestamp, ensuring consistency. If thetimestampfield in your logs was already a Unix epoch in seconds,.to_iconverts it to an integer, and* 1000converts it to milliseconds. If your logs had ISO8601 timestamps, theparserplugin would have already converted them to a RubyTimeobject, and.to_i * 1000would correctly get milliseconds. - Match: We’re sending it to
stdoutfor demonstration.
The problem this solves is that without explicit normalization, Fluentd defaults to using Time.now for an event’s timestamp if it can’t find or parse a suitable timestamp from the incoming record. This means out-of-order events might get a "current" timestamp, making your historical data appear to have arrived late, or worse, a malformed timestamp might be ignored entirely, leading to data loss. By using the parser plugin with time_key and time_format, you can tell Fluentd how to extract and interpret timestamps from your various log formats. Then, the record_transformer can ensure these extracted timestamps are consistently represented (e.g., as milliseconds since the epoch) before they are emitted, guaranteeing that the timestamp Fluentd associates with the record accurately reflects the event’s actual occurrence time.
The most surprising thing about Fluentd’s timestamp handling is that it doesn’t have a global, mandatory "timestamp field" setting; it relies on plugins to discover and parse timestamps from diverse log formats, and then uses the first one it successfully parses as the event’s timestamp. If it can’t parse any, it falls back to Time.now, which is often not what you want for historical log analysis.
If you’re dealing with timestamps that are in a timezone other than UTC and you haven’t specified that timezone during parsing, Fluentd will interpret them as UTC. This can lead to subtle but significant shifts in your timestamps when you later query your data, especially if your logs originate from systems in different geographical locations.
The next thing you’ll likely encounter is handling timezone-aware timestamps when your source logs aren’t consistently UTC.