Prometheus can’t actually scrape metrics from Fluentd directly; it needs a translator.

Let’s see Fluentd spitting out metrics in a format Prometheus understands.

Here’s a sample Fluentd configuration snippet that exposes metrics:

<source>
  @type prometheus
  port 24231
  metrics_path /metrics
</source>

<source>
  @type prometheus_monitor
  <labels>
    instance ${hostname}
    job fluentd
  </labels>
</source>

<match **>
  @type stdout
</match>

This setup does a few key things. The prometheus source plugin starts an HTTP server on port 24231, and the metrics_path /metrics tells it where to serve the metrics endpoint. The prometheus_monitor plugin, on the other hand, exposes internal Fluentd metrics like goroutine counts, memory usage, and garbage collection stats. The <labels> section is crucial for Prometheus to identify and group these metrics correctly, associating them with a specific instance (the hostname of the Fluentd node) and a job (in this case, fluentd). The <match **> section is just there to ensure Fluentd is actually processing some data, which will generate some operational metrics.

Once Fluentd is running with this configuration, you can verify the metrics are being served by curling the endpoint:

curl http://localhost:24231/metrics

You’ll see output like this:

# HELP fluentd_input_records_total Total number of records received by an input plugin.
# TYPE fluentd_input_records_total counter
fluentd_input_records_total{plugin_id="in_tail_1",tag="my.tag"} 12345
# HELP fluentd_output_records_total Total number of records emitted by an output plugin.
# TYPE fluentd_output_records_total counter
fluentd_output_records_total{plugin_id="out_stdout_1",tag="my.tag"} 12345
# HELP fluentd_buffer_queue_length_bytes Current number of bytes in a buffer queue.
# TYPE fluentd_buffer_queue_length_bytes gauge
fluentd_buffer_queue_length_bytes{plugin_id="out_file_1",tag="my.tag"} 0
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 12345678

This output is in Prometheus’s exposition format. You have counter types for things that only increase (like records processed) and gauge types for things that can go up and down (like buffer queue length). The labels (plugin_id, tag, instance, job) are what make these metrics powerful for filtering and aggregation in Prometheus.

To get Prometheus to collect these, you’ll configure a scrape_config in your Prometheus prometheus.yml:

scrape_configs:
  - job_name: 'fluentd'
    static_configs:
      - targets: ['localhost:24231']
        labels:
          instance: 'fluentd-instance-1'
          job: 'fluentd'

This tells Prometheus to look for targets under the fluentd job. If you have multiple Fluentd instances, you’d list them all here or use service discovery. The instance label here will be overridden by the label defined in the Fluentd prometheus_monitor plugin if it also defines an instance label, which is usually what you want for dynamic discovery.

The problem Fluentd’s metrics solve is visibility into its own operation. Without them, you’re flying blind, unsure if it’s processing logs, buffering them, or choking. This setup gives you real-time insights into throughput, latency (indirectly via buffer sizes), resource consumption, and potential bottlenecks. You can alert on buffer queues growing too large, or on input/output rates dropping unexpectedly.

The most surprising thing about Fluentd’s metrics exposure is how granular you can get with custom tags and labels. While the built-in prometheus plugin provides essential operational metrics, the prometheus_monitor plugin allows you to inject custom labels based on Fluentd’s runtime environment, like hostname or Kubernetes pod names, directly into the metrics themselves. This means you can correlate Fluentd’s performance directly with the specific services or hosts it’s processing data for, without needing separate enrichment steps.

This setup allows you to monitor Fluentd’s health and performance using the same tools you use for your applications. You can create dashboards in Grafana, build alerts in Alertmanager, and gain a comprehensive view of your logging pipeline’s status. The next step is often to integrate this with your application metrics for end-to-end tracing and debugging.

Want structured learning?

Take the full Fluentd course →