Loki stops receiving logs when the agent responsible for sending them crashes, misconfigures itself, or loses network connectivity to Loki.
Common Causes and Fixes for absent_over_time Alerts
This alert fires when a specific log stream, expected to be present within a given time window, is not found. It’s a critical signal that your log collection is broken, not that there’s an error in the logs themselves.
-
Promtail/Agent Crashes or Restarts: The most frequent culprit is the log collection agent (like Promtail) on your target machines crashing or restarting.
- Diagnosis: On the affected machine, check the agent’s status:
Look forsudo systemctl status promtail # Or for Docker: docker ps | grep promtailinactive (dead),failed, or a recent restart timestamp. Check agent logs forpanic,fatal, orexit codemessages:sudo journalctl -u promtail -n 100 --no-pager # Or for Docker: docker logs <promtail_container_id> - Fix: If the agent is stopped or failed, restart it:
If it’s repeatedly crashing, the agent logs will show why (e.g., out of memory, configuration parse error). Address the root cause indicated in the agent’s logs.sudo systemctl restart promtail # Or for Docker: docker restart <promtail_container_id> - Why it works: This ensures the log shipper process is running and actively tailing files and sending data to Loki.
- Diagnosis: On the affected machine, check the agent’s status:
-
Incorrect
relabel_configsin Promtail: Promtail usesrelabel_configsto discover targets and label logs. If these are misconfigured, Promtail might stop seeing the log files or stop sending logs with the expected labels that yourabsent_over_timequery relies on.- Diagnosis: Examine your Promtail configuration file (e.g.,
/etc/promtail/config.yamlor within a Docker volume). Pay close attention toscrape_configsandrelabel_configs.- Are the
targetscorrectly defined (e.g., using Kubernetes service discovery, file discovery, or static config)? - Are the
source_labelsandregexinrelabel_configsmatching the actual file paths or target metadata you expect? - Does the resulting
__meta_kubernetes_pod_label_app(or similar) match what your Loki query uses?
- Are the
- Fix: Adjust the
relabel_configsto correctly identify and label your log sources. For example, if you’re using Kubernetes and your query targets logs from pods withapp="my-service", ensure yourrelabel_configscorrectly extract and set theapplabel.
After updating, restart Promtail.# Example relabel_configs in promtail scrape_configs: - job_name: kubernetes-pods kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_label_app] action: keep regex: my-service # Ensure this matches your alert query - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_name] separator: / target_label: __path__ - Why it works: Correct
relabel_configsensure Promtail discovers the correct log files and attaches the metadata (labels) that Loki uses to route and query logs.
- Diagnosis: Examine your Promtail configuration file (e.g.,
-
Network Connectivity Issues: The agent might be running fine but unable to reach the Loki ingestion endpoint.
- Diagnosis: From the machine running Promtail, try to
curlthe Loki ingest address (e.g.,http://loki.example.com:3100/loki/api/v1/push).
Also, check firewall rules on the agent’s host and any intermediate network devices.curl -v http://loki.example.com:3100/loki/api/v1/push - Fix: Resolve network issues. This could involve fixing DNS, opening firewall ports (typically 3100 for HTTP, or 9009 for gRPC if using that), or correcting load balancer configurations pointing to Loki.
- Why it works: This restores the communication channel, allowing the agent to send its batched log entries to Loki.
- Diagnosis: From the machine running Promtail, try to
-
Disk Full on Agent Machine: If the disk where the agent is writing its internal state (like WAL for some agents, or even temporary files) is full, the agent can halt operations.
- Diagnosis: Check disk space on the agent’s host:
Look for partitions that are 100% full, especiallydf -h/var,/opt, or wherever Promtail’s data directory is located. - Fix: Free up disk space by removing old logs, temporary files, or expanding the disk.
- Why it works: Provides the necessary space for the agent to operate, write state, and process logs.
- Diagnosis: Check disk space on the agent’s host:
-
Loki Ingestion Endpoint Overload/Failure: While less common for
absent_over_time(which implies no logs, not slow logs), if Loki’s ingestion endpoints are completely failing to accept data due to extreme load or internal errors, agents might stop sending.- Diagnosis: Check Loki’s own metrics and logs. Look for high request latency, high error rates (especially 5xx errors) on the
/loki/api/v1/pushendpoint, or resource exhaustion (CPU, memory) on Loki pods/servers. - Fix: Scale up Loki horizontally (add more ingesters/queriers), optimize queries if they are causing excessive load on queriers that might indirectly impact ingestion, or address underlying resource constraints.
- Why it works: Restores Loki’s ability to accept and process incoming log data, allowing agents to resume sending.
- Diagnosis: Check Loki’s own metrics and logs. Look for high request latency, high error rates (especially 5xx errors) on the
-
Log File Rotation/Deletion Issues: The agent might be configured to tail a file, but that file is being unexpectedly rotated, deleted, or moved by another process before the agent can read it.
- Diagnosis: Check the agent’s configuration for
log_filesor similar directives. Verify the expected location and naming of log files. Look for evidence of aggressive log rotation or deletion scripts running on the host. - Fix: Ensure your log rotation strategy (e.g., using
logrotate) is compatible with your agent. Often, agents can be configured to follow file descriptors (rotate_honestly: truein Promtail) so they don’t lose track of the file after rotation. Alternatively, adjust the agent’s configuration to point to the correct log file pattern. - Why it works: The agent continues to track and read from the correct log file stream, even after it’s been rotated.
- Diagnosis: Check the agent’s configuration for
The next error you’ll likely encounter after fixing absent_over_time issues is an alert related to high log volume or specific error messages within the logs, indicating that the log collection is healthy but the application itself is experiencing problems.