Loki’s deduplication doesn’t actually remove duplicate log lines; it collapses them into a single entry to save storage.

Here’s a look at Loki’s deduplication in action, specifically how it handles exact log entries.

Imagine you have a service that’s being a bit too chatty, sending the same log message repeatedly within a short period. Loki, by default, will store every single one of these identical log lines. However, when you enable deduplication, Loki’s ingestion pipeline identifies these exact duplicates and, instead of writing them to its object storage multiple times, it keeps only the first occurrence and a count of how many times it has seen that specific log line since the last unique entry.

Let’s see this with a simulated scenario.

First, we need a Loki instance configured with deduplication enabled. The key configuration for this is within the ingester section of your Loki configuration file (loki-local.yaml or similar).

ingester:
  # ... other ingester settings ...
  dedupe:
    # Enable deduplication
    enabled: true
    # The maximum age of a log entry to consider for deduplication.
    # For exact log entries, this is the primary setting.
    # If a log entry is older than this, it will not be deduplicated
    # against newer identical entries.
    max_age: 10m
    # Maximum number of entries to hold in memory for deduplication.
    # This is a safeguard against unbounded memory growth.
    max_dedupe_entries: 10000

In this configuration:

  • enabled: true turns on the deduplication feature.
  • max_age: 10m means Loki will only consider log entries within the last 10 minutes for deduplication against new incoming logs. This is crucial. If two identical log lines are more than 10 minutes apart, they will not be deduplicated.
  • max_dedupe_entries: 10000 is a memory limit. If Loki has more than 10,000 unique log lines it’s tracking for deduplication, it will start evicting the oldest ones. This prevents memory exhaustion.

Now, let’s simulate sending duplicate logs. We’ll use promtail to send logs to Loki.

Scenario: Sending 5 identical log lines within 1 minute.

Let’s say our promtail is configured to send logs from a file like this:

2023-10-27 10:00:01 Some service started successfully.
2023-10-27 10:00:02 Some service started successfully.
2023-10-27 10:00:03 Some service started successfully.
2023-10-27 10:00:04 Some service started successfully.
2023-10-27 10:00:05 Some service started successfully.

When these logs are ingested by Loki with deduplication enabled and max_age set to 10m, Loki will process them as follows:

  1. First log: 2023-10-27 10:00:01 Some service started successfully. This is the first instance of this specific log line. Loki stores it and marks it as the "unique" entry. Internally, it might be tracked with a count of 1.
  2. Second log: 2023-10-27 10:00:02 Some service started successfully. Loki compares this to its in-memory tracking. It sees an identical log line that arrived recently (within max_age). Instead of writing this new log to storage, it increments the count associated with the first log entry. The internal representation might now be "Some service started successfully." (count: 2).
  3. Third, Fourth, and Fifth logs: The same process repeats. Each identical log line arriving within the max_age window will increment the count of the initial unique entry.

When you query Loki for these logs, say using logcli or Grafana:

{job="my-service"} |= "Some service started successfully."

Loki’s query engine, aware of the deduplication, will retrieve the single unique log entry and display its count. The actual output you see might look something like this:

2023-10-27 10:00:01 Some service started successfully. (count: 5)

This is the core of exact log entry deduplication. Loki doesn’t discard the data; it intelligently aggregates identical, recent log lines. This significantly reduces storage costs and can speed up queries by reducing the number of individual entries to scan.

The "count" is not explicitly stored as a separate field in Loki’s object storage for every single log line. Instead, Loki’s ingester maintains an in-memory table of recent, unique log lines and their counts. When a log line is fully processed and ready for long-term storage (after its max_age has passed or it’s flushed by the ingester), Loki writes one entry to object storage representing that unique log line and its accumulated count. The count is then reset in the ingester’s memory for future deduplication.

The mental model to hold onto is that Loki’s ingester acts as a short-term buffer and aggregation layer before data hits persistent storage. It’s constantly looking at incoming logs and comparing them against a recent history it keeps in memory. If a log matches a recent unique log, it’s effectively "merged" with that unique log by incrementing a counter. If it’s a new unique log, it gets added to the in-memory tracking.

The primary lever you control here is max_age. If max_age is too short, you won’t get much deduplication. If it’s too long, you risk higher memory usage in the ingester and potentially longer deduplication lookups. The max_dedupe_entries is a hard limit to prevent memory exhaustion, but it means that if you have a very high volume of unique log lines arriving rapidly, Loki might start evicting older unique lines from its deduplication cache, leading to less effective deduplication for those older lines.

After fixing deduplication issues, the next common problem you’ll encounter is understanding how Loki handles logs with slightly different timestamps but identical messages, or how to query the actual number of logs when deduplication is active.

Want structured learning?

Take the full Loki course →