Fluentd can help you trim down your log volume by selectively dropping logs, but it’s not about discarding data outright – it’s about intelligently reducing the noise to focus on what matters.
Let’s see this in action. Imagine you have a web server generating a ton of requests, many of which are successful but uninteresting. You want to capture errors and maybe a small percentage of successful requests for debugging, but not every single one.
Here’s a simplified Fluentd configuration that uses a filter_record_transformer to add a tag and then a filter_grep to sample:
<source>
@type tail
path /var/log/nginx/access.log
pos_file /var/log/td-agent/nginx-access.log.pos
tag nginx.access
<parse>
@type nginx
</parse>
</source>
<filter nginx.access>
@type record_transformer
<record>
# Add a unique identifier for potential later analysis if needed
uuid ${Memory.uuid}
</record>
</filter>
<filter nginx.access>
@type grep
# Keep records where the status code is NOT 200 (i.e., errors)
<exclude>
key status_code
pattern /^200$/
</exclude>
# OR, keep a random 1% of records where the status code IS 200
<and>
<inline>
key status_code
pattern /^200$/
</inline>
<inline>
key random_number
pattern /^[0-0]\.[0-9]{2}$/ # Matches numbers starting with 0.00 to 0.00
</inline>
</and>
</filter>
<filter nginx.access>
@type record_transformer
# Generate a random number between 0 and 1 for sampling
enable_ruby true
<record>
random_number ${rand}
</record>
</filter>
<match **>
@type stdout
</match>
In this example, the filter_record_transformer first adds a uuid (though Memory.uuid is not a standard Fluentd feature and would need a custom plugin or a different approach; for demonstration, let’s assume we’re adding some unique field). More importantly, another filter_record_transformer is used to inject a random_number field, populated with rand, which generates a floating-point number between 0.0 and 1.0.
The subsequent filter_grep is where the sampling magic happens. It has two main conditions:
excludeblock: This explicitly keeps any log record where thestatus_codedoes not start with "200". This means all 4xx and 5xx errors are passed through.andblock: This is combined with theexcludeblock using an implicit OR logic (Fluentd’s grep plugin can be tricky here; often you’d use separate grep filters or more complex logic). The intent here is to also keep a sample of successful requests (status code 200). Therandom_numberis checked against a pattern/^[0-0]\.[0-9]{2}$/. This pattern is designed to match numbers that are exactly0.00. This is a common, albeit slightly hacky, way to sample. Ifrandgenerates0.00, the record is kept. To sample 1%, you’d adjust the pattern to match numbers between0.00and0.01(e.g.,/^[0-0]\.(0[0-0]|[0-9][0-9])$/or a more robustrand() < 0.01if usingenable_ruby truedirectly in the grep). Correction: The provided pattern/^[0-0]\.[0-9]{2}$/would only match0.00. To get 1% sampling, you’d typically userand() < 0.01within arecord_transformerthat then informs thegrepor use a more sophisticated sampling plugin.
The problem this solves is the overwhelming volume of logs generated by high-traffic applications. Instead of storing every single event, you can filter down to critical errors and a statistically significant, but much smaller, sample of normal operations. This drastically reduces storage costs and makes log analysis more manageable.
Internally, Fluentd processes these filters sequentially. The record_transformer adds the random number. Then, the grep filter evaluates each record against its defined rules. If a record satisfies any of the conditions (either it’s an error or it’s a sampled success), it passes to the next stage; otherwise, it’s dropped.
The exact levers you control are the pattern and key directives within the grep filter, and the logic you use to generate the sampling metric in the record_transformer. You can sample based on any field – response time, user agent, specific URLs, etc.
A common misconception is that grep filters are purely about inclusion. They are, in fact, very powerful for exclusion as well, and combining inclusion/exclusion logic allows for sophisticated filtering.
The next concept you’ll likely explore is how to handle distributed tracing context or session IDs to ensure that all logs for a single problematic request, even if sampled, are kept together.