Loki stops receiving logs when the agent responsible for sending them crashes, misconfigures itself, or loses network connectivity to Loki.

Common Causes and Fixes for absent_over_time Alerts

This alert fires when a specific log stream, expected to be present within a given time window, is not found. It’s a critical signal that your log collection is broken, not that there’s an error in the logs themselves.

  1. Promtail/Agent Crashes or Restarts: The most frequent culprit is the log collection agent (like Promtail) on your target machines crashing or restarting.

    • Diagnosis: On the affected machine, check the agent’s status:
      sudo systemctl status promtail
      # Or for Docker:
      docker ps | grep promtail
      
      Look for inactive (dead), failed, or a recent restart timestamp. Check agent logs for panic, fatal, or exit code messages:
      sudo journalctl -u promtail -n 100 --no-pager
      # Or for Docker:
      docker logs <promtail_container_id>
      
    • Fix: If the agent is stopped or failed, restart it:
      sudo systemctl restart promtail
      # Or for Docker:
      docker restart <promtail_container_id>
      
      If it’s repeatedly crashing, the agent logs will show why (e.g., out of memory, configuration parse error). Address the root cause indicated in the agent’s logs.
    • Why it works: This ensures the log shipper process is running and actively tailing files and sending data to Loki.
  2. Incorrect relabel_configs in Promtail: Promtail uses relabel_configs to discover targets and label logs. If these are misconfigured, Promtail might stop seeing the log files or stop sending logs with the expected labels that your absent_over_time query relies on.

    • Diagnosis: Examine your Promtail configuration file (e.g., /etc/promtail/config.yaml or within a Docker volume). Pay close attention to scrape_configs and relabel_configs.
      • Are the targets correctly defined (e.g., using Kubernetes service discovery, file discovery, or static config)?
      • Are the source_labels and regex in relabel_configs matching the actual file paths or target metadata you expect?
      • Does the resulting __meta_kubernetes_pod_label_app (or similar) match what your Loki query uses?
    • Fix: Adjust the relabel_configs to correctly identify and label your log sources. For example, if you’re using Kubernetes and your query targets logs from pods with app="my-service", ensure your relabel_configs correctly extract and set the app label.
      # Example relabel_configs in promtail
      scrape_configs:
      - job_name: kubernetes-pods
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_label_app]
          action: keep
          regex: my-service # Ensure this matches your alert query
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_name]
          separator: /
          target_label: __path__
      
      After updating, restart Promtail.
    • Why it works: Correct relabel_configs ensure Promtail discovers the correct log files and attaches the metadata (labels) that Loki uses to route and query logs.
  3. Network Connectivity Issues: The agent might be running fine but unable to reach the Loki ingestion endpoint.

    • Diagnosis: From the machine running Promtail, try to curl the Loki ingest address (e.g., http://loki.example.com:3100/loki/api/v1/push).
      curl -v http://loki.example.com:3100/loki/api/v1/push
      
      Also, check firewall rules on the agent’s host and any intermediate network devices.
    • Fix: Resolve network issues. This could involve fixing DNS, opening firewall ports (typically 3100 for HTTP, or 9009 for gRPC if using that), or correcting load balancer configurations pointing to Loki.
    • Why it works: This restores the communication channel, allowing the agent to send its batched log entries to Loki.
  4. Disk Full on Agent Machine: If the disk where the agent is writing its internal state (like WAL for some agents, or even temporary files) is full, the agent can halt operations.

    • Diagnosis: Check disk space on the agent’s host:
      df -h
      
      Look for partitions that are 100% full, especially /var, /opt, or wherever Promtail’s data directory is located.
    • Fix: Free up disk space by removing old logs, temporary files, or expanding the disk.
    • Why it works: Provides the necessary space for the agent to operate, write state, and process logs.
  5. Loki Ingestion Endpoint Overload/Failure: While less common for absent_over_time (which implies no logs, not slow logs), if Loki’s ingestion endpoints are completely failing to accept data due to extreme load or internal errors, agents might stop sending.

    • Diagnosis: Check Loki’s own metrics and logs. Look for high request latency, high error rates (especially 5xx errors) on the /loki/api/v1/push endpoint, or resource exhaustion (CPU, memory) on Loki pods/servers.
    • Fix: Scale up Loki horizontally (add more ingesters/queriers), optimize queries if they are causing excessive load on queriers that might indirectly impact ingestion, or address underlying resource constraints.
    • Why it works: Restores Loki’s ability to accept and process incoming log data, allowing agents to resume sending.
  6. Log File Rotation/Deletion Issues: The agent might be configured to tail a file, but that file is being unexpectedly rotated, deleted, or moved by another process before the agent can read it.

    • Diagnosis: Check the agent’s configuration for log_files or similar directives. Verify the expected location and naming of log files. Look for evidence of aggressive log rotation or deletion scripts running on the host.
    • Fix: Ensure your log rotation strategy (e.g., using logrotate) is compatible with your agent. Often, agents can be configured to follow file descriptors (rotate_honestly: true in Promtail) so they don’t lose track of the file after rotation. Alternatively, adjust the agent’s configuration to point to the correct log file pattern.
    • Why it works: The agent continues to track and read from the correct log file stream, even after it’s been rotated.

The next error you’ll likely encounter after fixing absent_over_time issues is an alert related to high log volume or specific error messages within the logs, indicating that the log collection is healthy but the application itself is experiencing problems.

Want structured learning?

Take the full Loki course →