Grafana’s Unified Alerting system isn’t just a new UI; it fundamentally rethinks how alerts are defined, managed, and routed, consolidating multiple former alerting engines into a single, powerful interface.
Let’s see it in action. Imagine you have a critical service whose error rate you want to monitor.
First, you’d create a "Contact Point" to define where alerts should go. This could be an email address, a Slack channel, a PagerDuty service, or a webhook.
# Example Grafana Alerting Configuration (grafana.ini or via UI)
[alerting]
# Enable Unified Alerting
enabled = true
# Alertmanager configuration (optional, for advanced routing/deduplication)
# If you're using Grafana's built-in capabilities, this is less critical initially.
# The UI configuration for Alertmanager is now part of Grafana's settings.
# Example Contact Point configuration (configured via Grafana UI)
# Name: 'My Slack Channel'
# Type: 'Slack'
# Settings:
# URL: 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX'
# Recipient: '#alerts'
Next, you’d define a "Notification Policy." This is the routing logic. It determines which alerts go to which contact points based on labels.
# Example Notification Policy (configured via Grafana UI)
# Matcher: 'severity=critical'
# Contact Point: 'My Slack Channel'
# Continue matching subsequent sibling nodes: false
# Group By: 'alertname', 'job'
# Group Wait: 30s
# Group Interval: '5m'
# Repeat Interval: '4h'
Finally, you create an "Alert Rule" itself. This is where you define the condition, often using a PromQL or LogQL query, and assign labels to it.
# Example Alert Rule (configured via Grafana UI)
# Title: 'High Error Rate on MyService'
# Query:
# Expression: 'sum(rate(http_requests_total{status=~"5..", service="myservice"}[5m])) by (service, method)'
# Datasource: 'Prometheus'
# Evaluate every: '1m'
# For: '5m'
# Labels:
# severity: 'critical'
# team: 'backend'
# Annotations:
# summary: 'High error rate detected for {{ $labels.service }}'
# description: 'The error rate for {{ $labels.service }} is above 10% for the last 5 minutes.'
The core problem Unified Alerting solves is the fragmentation of alerting logic. Before, you had Grafana’s legacy alerting (for dashboard panels), Alertmanager (for Prometheus-style alerts), and potentially other systems. Now, all these capabilities are unified. You define your data source query (e.g., Prometheus, Loki), set the evaluation interval and duration (For), and then attach labels and annotations. These labels are the key to routing.
The system works by having a central alerting engine within Grafana. This engine evaluates your alert rules against your configured data sources. When a rule’s condition is met for the specified For duration, it fires an alert. This alert, with its attached labels, is then passed to the notification policy tree. The tree matches the alert’s labels against its rules, traversing down until it finds a match. That match then directs the alert to the designated contact point. If multiple sibling nodes match and continue matching is enabled, the alert can be sent to multiple contact points. The Group By labels determine how alerts are batched together into single notifications, reducing noise.
What most people miss is that the For duration in an alert rule is evaluated independently for each unique combination of labels produced by the query. If your query produces (service="A", method="GET") and (service="A", method="POST"), and the condition is met for both, the For timer starts for both independently. You don’t need to explicitly include service and method in your Group By to ensure they are treated distinctly by the For clause; that’s handled by the query’s output.
Understanding how label matching works in the notification policy tree is the next crucial step to mastering Grafana Unified Alerting.