Telegraf is designed to be the Swiss Army knife of metric collection, but its true power lies in its ability to seamlessly pipe those metrics into a time-series database like InfluxDB, acting as the central nervous system for your observability.

Let’s say you want to monitor your web server’s request rate and response times. Here’s how you’d configure Telegraf to grab that data and send it to InfluxDB.

First, you need Telegraf installed. The configuration file is usually located at /etc/telegraf/telegraf.conf. You’ll want to edit this file to tell Telegraf what to collect and where to send it.

The core of the configuration is divided into two main sections: [[inputs]] and [[outputs]].

Here’s a snippet of a telegraf.conf file to get you started, focusing on collecting basic system metrics and sending them to InfluxDB:

# Read metrics about cpu usage every 10s
[[inputs.cpu]]
  interval = "10s"
  percpu = true
  totalcpu = true

# Read metrics about memory usage every 10s
[[inputs.mem]]
  interval = "10s"

# Read metrics about disk usage every 10s
[[inputs.disk]]
  interval = "10s"
  ignore_fs = ["tmpfs", "devtmpfs", "vboxsf", "vmhgfs", "fuse.gvfs-fuse-daemon", "overlay", "aufs"]

# Read metrics about network throughput every 10s
[[inputs.net]]
  interval = "10s"

# Write metrics to InfluxDB
[[outputs.influxdb]]
  urls = ["http://127.0.0.1:8086"] # Replace with your InfluxDB host and port
  database = "telegraf"          # The database to write to
  # username = "telegraf_user"   # Uncomment and set if authentication is enabled
  # password = "telegraf_password" # Uncomment and set if authentication is enabled

In this example:

  • [[inputs.cpu]], [[inputs.mem]], [[inputs.disk]], and [[inputs.net]] are input plugins. They tell Telegraf what kind of data to collect.
    • interval = "10s" means Telegraf will poll these metrics every 10 seconds.
    • percpu = true and totalcpu = true for the cpu input mean you’ll get metrics for each individual CPU core as well as the aggregate total.
    • ignore_fs in the disk input is crucial for preventing Telegraf from reporting on ephemeral or virtual filesystems that don’t represent meaningful disk I/O.
  • [[outputs.influxdb]] is the output plugin. This is where you specify how and where to send the collected metrics.
    • urls = ["http://127.0.0.1:8086"] is the address of your InfluxDB instance. If InfluxDB is running on the same machine as Telegraf, 127.0.0.1:8086 is standard. If it’s remote, you’ll use its IP address or hostname.
    • database = "telegraf" is the InfluxDB database where Telegraf will write its metrics. If this database doesn’t exist, InfluxDB will typically create it automatically if the user has the necessary permissions.

To make this configuration active, you’ll need to restart the Telegraf service:

sudo systemctl restart telegraf

Once Telegraf is running, you can verify it’s sending data by querying InfluxDB. Using the InfluxDB command-line interface (CLI) or its web UI (Chronograf, Grafana), you can run a query like this to see if the cpu measurements are arriving:

> SHOW MEASUREMENTS
> SELECT * FROM cpu LIMIT 10

The most surprising truth about Telegraf’s plugin architecture is that it’s not just about collecting and sending; it’s also about transforming and enriching. You can chain multiple input and output plugins, or even use processing plugins like processors.aggregate to compute averages or sums before they hit InfluxDB, significantly reducing the load on your database and making your queries faster and simpler.

For example, imagine you have a custom application that exposes metrics via a Prometheus endpoint. You can configure Telegraf to scrape this endpoint using the [[inputs.prometheus]] plugin, and then transform those metrics using [[processors.regex] or [[processors.converter]] before sending them to InfluxDB. This allows you to consolidate metrics from diverse sources into a single, unified backend.

Consider the processors.metrics_filter plugin. Many systems emit a firehose of metrics, and you only care about a subset. Instead of filtering in InfluxDB or your dashboarding tool, you can tell Telegraf to drop specific metrics at the source. For instance, you might want to exclude cpu_user metrics if you’re only interested in system-level load, or filter out metrics from specific processes. This is done by specifying drop_measurement or drop_tag rules within the processor. This pre-filtering is incredibly efficient, saving both network bandwidth and InfluxDB storage.

The next step you’ll likely encounter is setting up retention policies in InfluxDB to manage how long your data is stored, or exploring more advanced input plugins like inputs.docker or inputs.kubernetes to gather container-specific metrics.

Want structured learning?

Take the full Influxdb course →