Fluentd’s routing system is actually a state machine where events are continuously transformed and filtered, not just a simple pipeline.
Let’s see how this plays out. Imagine you have a web server that logs in JSON format. You want to capture these logs, parse them, and send them to two different destinations: one for real-time monitoring and another for long-term archival.
Here’s a simplified fluentd.conf that accomplishes this:
# Input section: Listen for logs on port 24224
<source>
@type forward
port 24224
bind 0.0.0.0
</source>
# Filter section: Parse incoming JSON logs
<filter **>
@type parser
key_name message
reserve_data true
<parse>
@type json
</parse>
</filter>
# Output 1: Real-time monitoring (e.g., Elasticsearch)
<match web.access.**>
@type elasticsearch
host localhost
port 9200
logstash_format true
logstash_prefix web-access-logs
include_tag_key true
tag_key @log_name
</match>
# Output 2: Archival (e.g., S3)
<match web.archive.**>
@type s3
aws_key_id YOUR_ACCESS_KEY_ID
aws_secret_access_key YOUR_SECRET_ACCESS_KEY
s3_bucket your-log-archive-bucket
s3_region us-east-1
path logs/web/
buffer_type file
buffer_path /var/log/fluentd-buffers/s3
flush_interval 5m
</match>
When a log event comes in, Fluentd assigns it an initial tag, say web.access.log. The forward input plugin receives it. The filter ** directive, using a wildcard, matches all incoming events and applies the parser filter. This parses the JSON message within the event.
Now, the event, with its potentially modified structure and still carrying its tag web.access.log, proceeds to the matching phase. Fluentd looks for <match> blocks whose tag patterns encompass web.access.log.
The first <match web.access.**> block has a pattern that includes web.access.log (because ** matches zero or more path segments). So, this event is sent to the Elasticsearch output.
Crucially, Fluentd does not stop processing for this event. It continues to check other <match> blocks. The second <match web.archive.**> block also has a pattern that includes web.access.log (again, ** is flexible). Therefore, this same event is also sent to the S3 output.
This demonstrates the core routing mechanism: each event traverses the <match> blocks sequentially, and if its tag matches a pattern, the corresponding output plugin is invoked. An event can be sent to multiple destinations if its tag matches multiple <match> directives. The wildcards (* for a single segment, ** for zero or more segments) are powerful for defining broad or specific routing rules.
The real power comes from how you craft these tags and match directives. You can create complex routing topologies. For instance, you might have a <match web.access.error> that goes to a specific alerting system, while <match web.access.slow> goes to a performance analysis tool, and a general <match web.access> catches everything else for basic logging. The order of <match> blocks can matter if you have overlapping patterns and want to prioritize certain routes, but typically, Fluentd processes all matching outputs.
One subtle but critical aspect of tag manipulation is the @type record_transformer filter. You can use it to dynamically change an event’s tag after it’s been parsed but before it hits the <match> blocks. For example, if you wanted to route all events originating from a specific Kubernetes pod to a separate S3 bucket, you could add a filter like this:
<filter kubernetes.**>
@type record_transformer
enable_ruby true
<record>
tag ${record["kubernetes"]["pod_name"]}
</record>
renew_time_since_record_updated false # Important for tag changes
</filter>
<match YOUR_POD_NAME_TAG_PATTERN>
@type s3
# ... s3 config ...
</match>
This filter would inspect the kubernetes metadata within the event and, if present, rewrite the event’s tag to include the pod name. This allows for incredibly granular control over where events end up based on their content, not just their initial tag. The renew_time_since_record_updated false is vital here; if it were true, the timestamp associated with the event might be reset during the transformation, which could affect buffering and flushing behavior in unexpected ways.
Understanding how tags are matched and how filters can transform events, including their tags, is key to building robust and flexible data pipelines with Fluentd.
The next step is understanding how to manage buffering and retries when outputs are temporarily unavailable.