Loki’s storage costs can balloon because it stores every single log line, even those that are essentially identical.

Here’s how Loki works under the hood, and how you can shrink those bills:

Loki’s core design principle is to treat logs as streams of individual lines, each with associated labels. When you ingest logs, Loki stores these lines, indexed by their labels. The label index is what allows for efficient querying. However, every single log line, regardless of its content, is stored. This means that high-volume, repetitive log messages can consume a surprising amount of storage.

1. Reduce Log Verbosity at the Source

The most impactful way to cut costs is to stop logging redundant information in the first place.

  • Diagnosis: Examine your application logs. Are you seeing the same error message, the same "request processed" notification, or the same status update thousands of times a minute?

  • Fix: Configure your applications to log at a less verbose level (e.g., INFO instead of DEBUG) for routine operations. For specific recurring errors, implement rate limiting within your application or use a log aggregation tool’s deduplication features before they hit Loki. For example, if using Promtail, you can use pipeline_stages with match and drop stages, or a dedupe stage.

    # Example Promtail config snippet for deduplication
    scrape_configs:
    - job_name: myapp
      static_configs:
      - targets:
          - localhost
        labels:
          job: myapp
          __path__: /var/log/myapp.log
      pipeline_stages:
      - match:
          selector: '{job="myapp"}'
          stages:
          - dedupe:
              max_size: 1024 # Keep last 1024 unique log lines in memory
              max_age: 1m    # Discard duplicates older than 1 minute
          - timestamp:
              source: timestamp
              format: RFC3339Nano
          - labels: {}
    
  • Why it works: This prevents unnecessary log lines from ever reaching Loki, directly reducing the amount of data stored.

2. Optimize Labeling Strategy

Loki indexes logs based on labels. Too many unique label combinations mean a larger index, which increases storage and query costs.

  • Diagnosis: Run promtail --inspect on your Promtail instances or use Loki’s logcli to query the cardinality of your labels. Look for labels with very high cardinality (millions of unique values).

  • Fix: Reduce the number of labels, especially those that change frequently (like request IDs, trace IDs, or user IDs). Instead of making these labels, consider adding them as fields within the log message content. If you must have them as labels, consider sampling or aggregating them.

    # Example using logcli to check label cardinality (simplified)
    logcli --addr http://loki:3100 labels | grep "your_high_cardinality_label"
    

    If a label like request_id is causing issues, adjust your Promtail configuration to not add it as a label.

  • Why it works: A smaller label index means less metadata to store and faster lookups during queries, reducing both storage and query processing overhead.

3. Implement Log Retention Policies

Don’t keep logs forever if you don’t need them.

  • Diagnosis: Check your current retention settings in Loki’s configuration.

  • Fix: Set appropriate retention periods for your data. Loki’s configuration allows you to define retention per tenant or globally. For example, in your loki.yaml configuration:

    limits_config:
      retention_period: 30d # Default retention for all tenants
    

    Or, for specific tenants:

    schema_config:
      configs:
        - from: 2020-10-24
          store: boltdb-shipper
          object_store: s3
          schema: v11
          index:
            prefix: index_
            period: 24h
          chunks:
            prefix: chunk_
            period: 24h
    # Example for specific tenant retention
    multitenant_configs:
      tenants:
        - name: my-tenant
          limits:
            retention_period: 7d
    
  • Why it works: Old, unneeded logs are deleted, directly reducing the total volume of data stored.

4. Leverage Compression

Loki supports various compression algorithms to reduce the size of stored data.

  • Diagnosis: Verify your Loki and storage backend (e.g., S3, GCS) are configured to use compression.

  • Fix: Ensure your Loki configuration specifies a compression algorithm for chunks. Common options include gzip, snappy, or lz4. For example, in loki.yaml:

    storage_config:
      aws: # Or gcs, azure, etc.
        s3: s3://your-bucket-name/loki/
        region: us-east-1
        compress_chunks: true # Enable compression for chunks
        # Optional: Specify compression algorithm if your backend supports it directly
        # For block storage like S3, Loki handles compression before upload.
    

    Your object storage (S3, GCS) also typically supports server-side compression.

  • Why it works: Compressed data takes up less space on disk or in object storage, reducing storage costs.

5. Use the Right Index Type and Configuration

Loki offers different index strategies, and misconfiguration can lead to excessive index size.

  • Diagnosis: Examine your schema_config in loki.yaml. Are you using boltdb-shipper with appropriate period settings?

  • Fix: For long-term storage, boltdb-shipper is generally recommended. Ensure the period for both index and chunks is set appropriately (e.g., 24h). This dictates how often index files are flushed and uploaded.

    schema_config:
      configs:
        - from: 2023-01-01
          store: boltdb-shipper
          object_store: s3
          schema: v11
          index:
            prefix: index_
            period: 24h # Flush index every 24 hours
          chunks:
            prefix: chunk_
            period: 24h # Flush chunks every 24 hours
    
  • Why it works: boltdb-shipper flushes index data into object storage periodically, allowing it to be garbage collected more efficiently and reducing the active index size Loki needs to manage.

6. Consider Log Sampling

For extremely high-volume, low-value logs, sampling can be effective.

  • Diagnosis: Identify log sources that contribute a massive volume but provide minimal unique diagnostic information.

  • Fix: Configure Promtail (or your log agent) to only send a fraction of these logs. For example, Promtail’s sampler stage can be used.

    # Example Promtail config snippet for sampling
    pipeline_stages:
    - match:
        selector: '{job="very_noisy_app"}'
        stages:
        - sampler:
            # Sample 1 out of every 1000 log lines
            # The probability is applied per line, so it's not perfectly uniform
            # but effective for cost reduction.
            probability: 0.001
        - timestamp:
            source: timestamp
            format: RFC3339Nano
        - labels: {}
    
  • Why it works: By only storing a representative subset of logs, you drastically reduce storage volume for those sources, while still retaining enough data for general trend analysis.

The next error you might encounter is related to query performance degradation as your label index grows, even if your chunk storage is optimized.

Want structured learning?

Take the full Loki course →