InfluxDB alerts don’t actually send notifications; they just decide when a notification should be sent.
Here’s a dashboard showing a few example alerts firing:
[
{
"id": "alert-cpu-usage",
"name": "High CPU Usage",
"query": "SELECT mean(\"usage_user\") FROM \"cpu\" WHERE time >= -5m",
"options": {
"every": "1m",
"period": "5m",
"threshold": 80,
"mode": "any",
"message": "CPU usage is {{ .Value }}% which is above the 80% threshold!",
"severity": "critical"
},
"data": [
{
"fields": [
{"name": "time", "type": "time"},
{"name": "mean", "type": "float"}
],
"values": [
[1678886400000000000, 85.5],
[1678886460000000000, 82.1],
[1678886520000000000, 88.9]
]
}
],
"status": "firing"
},
{
"id": "alert-disk-free",
"name": "Low Disk Space",
"query": "SELECT (1 - mean(\"free\")) * 100 FROM \"disk\" WHERE time >= -5m",
"options": {
"every": "5m",
"period": "15m",
"threshold": 90,
"mode": "all",
"message": "Disk usage is {{ .Value }}% which is above the 90% threshold!",
"severity": "warning"
},
"data": [
{
"fields": [
{"name": "time", "type": "time"},
{"name": "percent_used", "type": "float"}
],
"values": [
[1678886400000000000, 92.3],
[1678887300000000000, 91.5]
]
}
],
"status": "firing"
}
]
This JSON represents two active alerts. The status field shows "firing", meaning the alert condition has been met. The message field contains a template that will be filled with the actual data when the alert triggers. Notice how the query for each alert is a standard InfluxQL query that returns a single value. The options define the alert’s behavior: every is how often the query runs, period is the lookback window for the query, threshold is the value that triggers the alert, and mode (any or all) determines how multiple data points within the period affect the trigger.
The core problem InfluxDB alerts solve is turning time-series data into actionable insights without constant manual inspection. You define a condition based on your data, and InfluxDB watches for it. When the condition is met, the alert state changes. This state change is what other systems consume.
InfluxDB itself doesn’t have native "notification endpoints." It’s designed to integrate with external services. The typical flow is:
- InfluxDB Alert Rule: You define a query and conditions in InfluxDB.
- Alert State Change: InfluxDB evaluates the rule and changes its state (e.g., from
OKtoFIRING). - External Notification System: A separate service (like Kapacitor, Alertmanager, PagerDuty, Slack integrations, or custom scripts) polls InfluxDB for alert state changes or receives webhooks from InfluxDB (if configured).
- Notification Sent: The external system then formats and sends the actual notification (email, Slack message, PagerDuty incident, etc.).
Let’s break down how to set this up.
Setting Up Alert Rules in InfluxDB (v2.x)
In InfluxDB v2.x, alerts are managed through the UI or the API. You’ll define a "check" which is essentially your alert rule.
-
Navigate to Alerts: In the InfluxDB UI, go to the "Alerting" tab.
-
Create a New Check: Click "Checks" and then "New check."
-
Choose a Template or Build from Scratch: You can use pre-defined templates (like "Threshold") or build a custom one. For this example, we’ll use "Threshold."
-
Configure the Query:
-
Data Source: Select your bucket.
-
Query: Enter your InfluxQL or Flux query. For example, to check if average CPU usage is above 80% over the last 5 minutes:
from(bucket: "my-metrics") |> range(start: -5m) |> filter(fn: (r) => r["_measurement"] == "cpu" and r["_field"] == "usage_user") |> mean() -
Check Every: How often InfluxDB should run the query (e.g.,
1m). -
For: How long the condition must be true before triggering (e.g.,
5m). This is yourperiodin v1.x terms.
-
-
Configure the Condition:
-
Operator: Select the comparison operator (e.g.,
greater than). -
Threshold: Enter the value (e.g.,
80). -
If: Choose
anyorallfor how multiple points in the time window are evaluated.allmeans every data point in the window must meet the condition.anymeans at least one data point must meet it. -
Message: Define your notification message. Use template variables like
{{ .Value }}for the actual data point,{{ .Name }}for the check name, and{{ .Level }}for the severity.
-
-
Set Severity: Choose a severity level (e.g.,
critical,warning,info). -
Save the Check: Give your check a name and save it.
Setting Up Notification Endpoints
InfluxDB v2.x uses "Notification Endpoints" to define where alerts should be sent. These are configured separately from the checks.
-
Navigate to Endpoints: In the InfluxDB UI, go to "Alerting" -> "Notification endpoints."
-
Create a New Endpoint: Click "New notification endpoint."
-
Choose a Type: Select the type of endpoint (e.g., Slack, PagerDuty, HTTP, Email).
-
Configure the Endpoint:
-
Slack Example:
- Name:
Slack-Team-Alerts - Slack URL: Your Slack incoming webhook URL (e.g.,
https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX) - Message Template: Customize the message sent to Slack.
- Name:
-
HTTP Example:
- Name:
My-Custom-Webhook - URL: The URL of your custom handler (e.g.,
http://my-alert-handler.example.com/receive) - Method:
POST - Auth Token: If your handler requires authentication.
- Message Template: Customize the payload sent.
- Name:
-
-
Save the Endpoint.
Connecting Checks to Endpoints
Finally, you link your checks to your notification endpoints.
- Edit Your Check: Go back to your check configuration.
- Add a Notification Rule: Scroll down to the "Notification rules" section and click "Add notification rule."
- Configure the Rule:
- Trigger: Choose when to send a notification (e.g.,
On alertorOn recovery). - Endpoint: Select the notification endpoint you created (e.g.,
Slack-Team-Alerts). - Message Template: This is the final message template that gets sent to the endpoint. It can override or supplement the check’s message.
- Trigger: Choose when to send a notification (e.g.,
The Counterintuitive Part: Alerting is a Two-Tiered System
Many people assume that setting up an alert in InfluxDB means notifications will magically appear. The reality is that InfluxDB’s alerting mechanism is designed for detection and state management, not direct delivery. It’s a signal generator. The actual act of sending an email, posting to Slack, or creating a PagerDuty incident is handled by a separate, often external, system. This separation provides flexibility, allowing you to decouple your monitoring from your notification logic and use specialized tools for each. You might have one InfluxDB check triggering alerts, but route those alerts to different endpoints based on severity or time of day using a routing layer like Alertmanager.
The next step after getting notifications working is often implementing more sophisticated routing and silencing strategies within your notification system to avoid alert fatigue.