Loki’s Write-Ahead Log (WAL) is what makes durable log ingestion possible, and the most surprising thing about it is that it’s not primarily about speed, but about preventing data loss even when Loki crashes mid-ingestion.

Let’s see it in action. Imagine Loki is running and receiving logs from multiple tenants. A chunk-store component is responsible for writing these logs into chunks on disk. Before a chunk is fully written and indexed, Loki needs to ensure that if it crashes, it can recover the in-progress work. This is where the WAL comes in.

When a log entry arrives, the ingester component doesn’t immediately write it to the chunk store. Instead, it first writes a record of this incoming log entry to the WAL. This WAL is essentially a pre-write log, stored on disk, typically in a directory like /loki/wal.

Here’s a simplified view of the process:

  1. Log Ingestion: A log line arrives at the Loki ingester.
  2. WAL Write: The ingester writes a record representing this log line to the WAL file. This record includes the log content, tenant ID, timestamp, and labels.
  3. Chunking & Indexing: The ingester then processes this log line, potentially batching it with others into a chunk. This chunk is eventually written to the configured object storage (like S3, GCS, or local disk).
  4. WAL Cleanup: Once the chunk containing the log data is successfully written to object storage and indexed, the corresponding record in the WAL is marked for deletion. Loki periodically truncates the WAL files, removing these completed entries.

The WAL directory structure typically looks like this:

/loki/wal/
├── 00000000000000000000 # WAL segment file
├── 00000000000000000001 # Another WAL segment file
└── checkpoints/
    └── 00000000000000000000 # Checkpoint file

The 00000000000000000000 files are the actual WAL segments where incoming records are appended. The checkpoints/ directory contains information about the last successfully processed WAL segment, allowing Loki to resume from the correct point after a restart.

The problem the WAL solves is atomicity of ingestion. Without it, if Loki crashed after accepting a log line but before it was fully written to durable storage and indexed, that log line would be lost forever. The WAL acts as a safety net.

The key configuration parameters for the WAL are found in loki-local.yaml (or your equivalent configuration file):

ingester:
  wal:
    enabled: true
    dir: /loki/wal
    # How long WAL entries are kept before being deleted if not yet persisted.
    retention_period: 24h
    # How often to create a checkpoint.
    checkpoint_interval: 5m
    # How often to truncate WAL files.
    truncate_interval: 1m

Let’s break down what these mean:

  • enabled: true: This is the default and ensures the WAL is active.
  • dir: /loki/wal: This is the filesystem path where the WAL segment files and checkpoints are stored. Make sure this directory exists and Loki has write permissions. If this path is on slow or unreliable storage, it can become a bottleneck.
  • retention_period: 24h: This is crucial. It defines how long a log record stays in the WAL even if it hasn’t been successfully persisted to object storage. If Loki crashes, it will replay WAL entries up to the point of the last successful persistence. Entries older than retention_period that haven’t been persisted will be dropped. A longer retention_period provides a larger window for recovery but consumes more disk space.
  • checkpoint_interval: 5m: Loki creates checkpoints periodically. A checkpoint records the state of the WAL, specifically which WAL entries have been successfully persisted to object storage. This prevents Loki from having to re-read the entire WAL from the beginning on every restart.
  • truncate_interval: 1m: After a checkpoint is created, Loki can truncate older WAL segment files that are no longer needed. This prevents the WAL directory from growing indefinitely.

The WAL is composed of append-only segment files. When an ingester receives data, it appends it to the current WAL segment. Once a segment reaches a certain size or a checkpoint is created, Loki might switch to a new segment. The checkpointing process is vital for efficient recovery. It writes a file (e.g., 00000000000000000000) into the checkpoints/ directory that signifies the highest offset within the WAL that has been successfully persisted to object storage.

During startup, Loki reads the latest checkpoint file. It then replays all WAL entries after the offset indicated by the checkpoint. This replay process reconstructs any in-progress chunks and ensures that data that was written to the WAL but not yet to object storage is recovered.

The actual persistence to object storage happens asynchronously. The ingester writes to the WAL first, then signals a separate chunk-store or chunk-shipper component to write the chunk to object storage. The WAL entry is only removed from the WAL once confirmation of successful object storage write is received.

One thing that many operators overlook is the performance implications of the WAL directory. While it’s designed for durability, if the underlying storage for /loki/wal is slow (e.g., NFS with high latency, or a spinning disk under heavy load), it can become a bottleneck for ingestion. Loki is constantly appending to these WAL files, and if the disk can’t keep up, ingestion rates will suffer. Ensure the WAL directory resides on fast, local SSDs for optimal performance.

The next thing you’ll likely encounter after tuning your WAL is understanding how Loki handles object storage durability and the implications of chunk_store_config and schema_config.

Want structured learning?

Take the full Loki course →