Fluentd can help you trim down your log volume by selectively dropping logs, but it’s not about discarding data outright – it’s about intelligently reducing the noise to focus on what matters.

Let’s see this in action. Imagine you have a web server generating a ton of requests, many of which are successful but uninteresting. You want to capture errors and maybe a small percentage of successful requests for debugging, but not every single one.

Here’s a simplified Fluentd configuration that uses a filter_record_transformer to add a tag and then a filter_grep to sample:

<source>
  @type tail
  path /var/log/nginx/access.log
  pos_file /var/log/td-agent/nginx-access.log.pos
  tag nginx.access
  <parse>
    @type nginx
  </parse>
</source>

<filter nginx.access>
  @type record_transformer
  <record>
    # Add a unique identifier for potential later analysis if needed
    uuid ${Memory.uuid}
  </record>
</filter>

<filter nginx.access>
  @type grep
  # Keep records where the status code is NOT 200 (i.e., errors)
  <exclude>
    key status_code
    pattern /^200$/
  </exclude>
  # OR, keep a random 1% of records where the status code IS 200
  <and>
    <inline>
      key status_code
      pattern /^200$/
    </inline>
    <inline>
      key random_number
      pattern /^[0-0]\.[0-9]{2}$/ # Matches numbers starting with 0.00 to 0.00
    </inline>
  </and>
</filter>

<filter nginx.access>
  @type record_transformer
  # Generate a random number between 0 and 1 for sampling
  enable_ruby true
  <record>
    random_number ${rand}
  </record>
</filter>

<match **>
  @type stdout
</match>

In this example, the filter_record_transformer first adds a uuid (though Memory.uuid is not a standard Fluentd feature and would need a custom plugin or a different approach; for demonstration, let’s assume we’re adding some unique field). More importantly, another filter_record_transformer is used to inject a random_number field, populated with rand, which generates a floating-point number between 0.0 and 1.0.

The subsequent filter_grep is where the sampling magic happens. It has two main conditions:

  1. exclude block: This explicitly keeps any log record where the status_code does not start with "200". This means all 4xx and 5xx errors are passed through.
  2. and block: This is combined with the exclude block using an implicit OR logic (Fluentd’s grep plugin can be tricky here; often you’d use separate grep filters or more complex logic). The intent here is to also keep a sample of successful requests (status code 200). The random_number is checked against a pattern /^[0-0]\.[0-9]{2}$/. This pattern is designed to match numbers that are exactly 0.00. This is a common, albeit slightly hacky, way to sample. If rand generates 0.00, the record is kept. To sample 1%, you’d adjust the pattern to match numbers between 0.00 and 0.01 (e.g., /^[0-0]\.(0[0-0]|[0-9][0-9])$/ or a more robust rand() < 0.01 if using enable_ruby true directly in the grep). Correction: The provided pattern /^[0-0]\.[0-9]{2}$/ would only match 0.00. To get 1% sampling, you’d typically use rand() < 0.01 within a record_transformer that then informs the grep or use a more sophisticated sampling plugin.

The problem this solves is the overwhelming volume of logs generated by high-traffic applications. Instead of storing every single event, you can filter down to critical errors and a statistically significant, but much smaller, sample of normal operations. This drastically reduces storage costs and makes log analysis more manageable.

Internally, Fluentd processes these filters sequentially. The record_transformer adds the random number. Then, the grep filter evaluates each record against its defined rules. If a record satisfies any of the conditions (either it’s an error or it’s a sampled success), it passes to the next stage; otherwise, it’s dropped.

The exact levers you control are the pattern and key directives within the grep filter, and the logic you use to generate the sampling metric in the record_transformer. You can sample based on any field – response time, user agent, specific URLs, etc.

A common misconception is that grep filters are purely about inclusion. They are, in fact, very powerful for exclusion as well, and combining inclusion/exclusion logic allows for sophisticated filtering.

The next concept you’ll likely explore is how to handle distributed tracing context or session IDs to ensure that all logs for a single problematic request, even if sampled, are kept together.

Want structured learning?

Take the full Fluentd course →