Loki’s Bloom filter index can dramatically speed up queries by letting Loki know before it even looks at object storage whether a given chunk might contain the data you’re looking for.

Let’s watch this in action. Imagine we’ve got logs for two distinct applications, app1 and app2, and we want to find logs from app1 within a specific time range.

Here’s our Loki configuration snippet, focusing on the Bloom filter setup:

ingester:
  chunk_block_size: 256
  chunk_encoding: snappy
  chunk_idle_period: 1h
  chunk_retention_period: 24h
  chunk_write_batch_size: 1000
  chunk_target_size: 256KiB
  max_block_chunk_size: 256KiB

schema_config:
  configs:
    - from: 2023-01-01
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index:
        prefix: index/
        period: 24h
        bloom:
          enabled: true
          filters_per_chunk: 2
          min_bloom_false_positive_rate: 0.01
          max_bloom_false_positive_rate: 0.1

And here’s a sample query:

{app="app1"} | time >="2023-10-26T10:00:00Z" | time <="2023-10-26T11:00:00Z"

When this query hits Loki, the query frontend first consults the Bloom filter index. If the Bloom filter for a specific chunk indicates that it definitely does not contain logs with app="app1", Loki skips reading that chunk from S3 entirely. This is the magic: it avoids potentially expensive I/O operations for data that’s irrelevant to the query. Only chunks for which the Bloom filter returns a "might contain" result are then fetched and inspected.

The core problem Bloom filters solve is the "read everything" problem. Without them, Loki would have to download and scan every single chunk within the requested time range to check its labels. For large deployments with many tenants and high log volumes, this quickly becomes a massive bottleneck. Bloom filters act as a cheap, probabilistic pre-filter. They trade a tiny chance of a false positive (saying a chunk might have the data when it doesn’t) for a massive reduction in unnecessary disk reads.

Here’s how Loki builds these filters:

  1. Chunking: Loki groups log lines into chunks, typically based on size (e.g., chunk_target_size: 256KiB) and time (chunk_idle_period: 1h, chunk_retention_period: 24h).
  2. Label Indexing: As logs are ingested and chunked, Loki extracts unique label key-value pairs associated with those log lines.
  3. Bloom Filter Creation: For each chunk, Loki generates Bloom filters for a configurable number of distinct label sets found within that chunk (filters_per_chunk: 2). These filters are essentially bit arrays where each label key-value pair is "hashed" and sets specific bits. The goal is to create a compact representation of the labels present in the chunk. The min_bloom_false_positive_rate and max_bloom_false_positive_rate parameters guide the size and configuration of the Bloom filter to achieve a desired trade-off between size and accuracy.
  4. Storage: These Bloom filters are stored alongside the chunk metadata, often in the same object storage (e.g., S3) where the chunks themselves reside, but in a separate index directory (index/).

When a query comes in, Loki reconstructs the Bloom filter for the relevant time range from the index. It then iterates through the labels in the query (e.g., {app="app1"}). For each label, it checks if the Bloom filter indicates the potential presence of that label. If the Bloom filter says "no" for any of the query’s required labels, Loki can discard the entire chunk without fetching it. If it says "yes" for all required labels, the chunk is a candidate and will be downloaded for a definitive check.

The number of filters per chunk (filters_per_chunk) is a key tuning parameter. If set too low, a single Bloom filter might have to represent too many different label combinations, increasing its false positive rate and reducing its effectiveness. If set too high, it increases the overhead of building and storing the filters. Loki aims to find the most common label sets within a chunk to build these filters.

The min_bloom_false_positive_rate and max_bloom_false_positive_rate control the size and capacity of the Bloom filter. A lower false positive rate requires a larger Bloom filter (more bits). Loki dynamically adjusts the filter size within these bounds to achieve the desired false positive rate for the number of elements (label-value pairs) it needs to store.

One thing many operators overlook is that Bloom filters are label-specific. They don’t index the log content itself, only the labels attached to log lines. This means their effectiveness is directly tied to how well your logs are labeled. If you’re querying based on unstructured text in the log body without using log parsing to extract labels, Bloom filters won’t help you at all for those specific queries. They are designed to accelerate label-based filtering, which is crucial for efficient log retrieval.

The next acceleration mechanism you’ll likely encounter is the index compression and query planning optimizations within Loki’s query engine.

Want structured learning?

Take the full Loki course →