Loki Ruler lets you turn your log patterns into actionable alerts, but the real magic is that your alerts become a first-class log source themselves.

Let’s see it in action. Imagine you’re running a web service and want to know when your error rate spikes. You’d first write a Loki query to identify errors. Something like this:

{job="webapp"} |= "error"

This query counts log lines from the webapp job that contain the string "error". Now, you want to be alerted if this count goes above 100 in 5 minutes. Loki Ruler does this by creating a new "alert rule" that Loki itself will query.

Here’s what a basic ruler configuration might look like in a ruler.yaml file:

groups:
- name: webapp-alerts
  rules:
  - alert: HighErrorRate
    expr: |
      sum by (job) (
        rate({job="webapp"} |= "error" [5m])
      ) > 100
    for: 5m
    labels:
      severity: warning
    annotations:

      summary: "High error rate detected on {{ $labels.job }}"

      description: "The webapp job has seen more than 100 errors per second over the last 5 minutes."

When Loki loads this, it doesn’t just store it; it starts executing this query periodically. If the expr evaluates to true for the duration specified by for (in this case, 5 minutes), an alert is fired. This alert is then sent to a configured Alertmanager, which handles deduplication, grouping, and routing to notification channels like Slack or PagerDuty.

The "surprising" part? When the HighErrorRate alert is active, Loki Ruler generates log lines for the alert itself. These logs will appear in Loki with labels derived from the alert, like alertname="HighErrorRate", severity="warning", and any labels from the original query that were included in the alert expression’s sum by clause. This means you can query your alerts just like you query your application logs, to understand alert frequency, duration, and the context around them.

The core problem Loki Ruler solves is bridging the gap between log analysis and proactive incident response. Traditionally, you’d run a log query, notice a spike, and then manually set up an alert. Ruler automates this. It allows you to define alerting conditions directly within your logging system, using the same query language you’re already familiar with.

Internally, the ruler component of Loki periodically evaluates the expr for each rule. It sends the results to a configured Alertmanager. The for clause is crucial: it ensures that an alert only fires if the condition persists for a specified duration, preventing noisy alerts for transient issues. The labels and annotations fields enrich the alert with metadata that Alertmanager uses for routing and humans use for context.

The expr field is where the power lies. You can use any valid LogQL query. For instance, to alert on a specific error message:

{job="webapp"} |= "database connection refused"

And the ruler rule:

groups:
- name: webapp-db-errors
  rules:
  - alert: DatabaseConnectionRefused
    expr: |
      sum by (job) (
        rate({job="webapp"} |= "database connection refused" [5m])
      ) > 5
    for: 1m
    labels:
      severity: critical
    annotations:

      summary: "Frequent database connection refusals on {{ $labels.job }}"

      description: "The webapp job is experiencing more than 5 database connection refused errors per second for over 1 minute."

This establishes a direct feedback loop: your logs inform your alerts, and your alerts generate logs that you can then analyze. This creates a comprehensive observability picture where the system’s health status is itself a queryable log event.

One aspect often overlooked is how Loki Ruler handles rule updates and rule discovery. If you update your ruler.yaml file, Loki will pick up these changes automatically (depending on its configuration, often via a Git repository sync or a direct API call). This dynamic re-evaluation means your alerting logic can evolve alongside your application without requiring restarts. Furthermore, multiple ruler instances can be run for high availability, and they coordinate to ensure rules are evaluated efficiently.

The next step after mastering ruler configuration is understanding how to fine-tune alert severity levels and implement dead man’s snitches for critical services.

Want structured learning?

Take the full Loki course →