The Loki chunk store is the heart of Loki’s long-term storage, and it’s designed to be flexible, allowing you to plug in different backend storage systems. The most common choices for production are object storage like S3, or more actively managed databases like DynamoDB and Cassandra. This article dives into using DynamoDB and Cassandra as your chunk store.

DynamoDB as a Loki Chunk Store

DynamoDB is a fully managed NoSQL database service from AWS. When used with Loki, it provides a highly available and scalable solution for storing your log chunks.

How it Works: Loki stores log data in "chunks," which are compressed blocks of log lines. When using DynamoDB, each chunk is stored as an item in a DynamoDB table. The table schema is designed to efficiently retrieve chunks based on their labels and time range.

Configuration: To configure Loki to use DynamoDB, you’ll need to specify the DynamoDB table name and the region.

storage:
  dynamodb:
    table: loki-chunks
    region: us-east-1
    # Optional: For specific AWS credentials if not using IAM roles
    # access_key_id: YOUR_ACCESS_KEY_ID
    # secret_access_key: YOUR_SECRET_ACCESS_KEY

Key Considerations for DynamoDB:

  • Provisioned Throughput: You’ll need to provision read and write capacity units (RCUs/WCUs) for your DynamoDB table. Loki’s write patterns are spiky, so consider using on-demand capacity or carefully tune provisioned capacity with auto-scaling.
  • Table Creation: Loki does not automatically create the DynamoDB table. You’ll need to create it manually with the correct primary key (hash key: tenant_id_hash, range key: tenant_id_time). The hash key is typically a derived value from tenant_id to ensure even distribution across partitions. The range key is a timestamp representing the start of the chunk.
  • Cost: DynamoDB costs are based on provisioned throughput, storage, and data transfer. Monitor your usage closely.

Example DynamoDB Table Setup (AWS CLI):

aws dynamodb create-table \
    --table-name loki-chunks \
    --attribute-definitions \
        AttributeName=tenant_id_hash,AttributeType=S \
        AttributeName=tenant_id_time,AttributeType=N \
    --key-schema \
        AttributeName=tenant_id_hash,KeyType=HASH \
        AttributeName=tenant_id_time,KeyType=RANGE \
    --provisioned-throughput ReadCapacityUnits=20,WriteCapacityUnits=20 \
    --region us-east-1

Note: The tenant_id_hash needs to be generated by Loki’s internal logic, often involving a modulo operation on the tenant ID. For simplicity, you might initially set it as the tenant_id itself and adjust later if you encounter hot partitions.

Cassandra as a Loki Chunk Store

Cassandra is a highly scalable, distributed NoSQL database. It’s a good choice for Loki when you need to manage your own database infrastructure and require massive write throughput.

How it Works: Similar to DynamoDB, Loki stores chunks as rows in a Cassandra table. The table is designed with a composite primary key that allows for efficient querying by tenant ID and time.

Configuration: You’ll need to specify the Cassandra contact points (IP addresses or hostnames of your Cassandra nodes) and the keyspace Loki will use.

storage:
  cassandra:
    addresses:
      - "192.168.1.100"
      - "192.168.1.101"
    keyspace: loki
    # Optional: Authentication
    # username: loki_user
    # password: loki_password
    # Optional: Table name if not using the default 'chunks'
    # table: log_chunks

Key Considerations for Cassandra:

  • Keyspace and Table Creation: Loki does not create the Cassandra keyspace or table. You must create them beforehand. The table schema typically includes columns for tenant_id, chunk_start_time, chunk_end_time, data, and index. The primary key is usually (tenant_id, chunk_start_time).
  • Replication Factor: For production, ensure your Cassandra keyspace has an appropriate replication factor (e.g., 3 for SimpleStrategy in a single data center, or NetworkTopologyStrategy for multi-datacenter deployments) to guarantee availability and durability.
  • Compaction Strategy: Choose a compaction strategy that suits Loki’s write-heavy workload. LeveledCompactionStrategy or SizeTieredCompactionStrategy are common choices.
  • Data Model: Loki’s Cassandra schema is optimized for time-series data retrieval. Understanding how Cassandra distributes data across nodes based on the primary key is crucial for performance tuning.

Example Cassandra Table Setup (cqlsh):

CREATE KEYSPACE loki WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
USE loki;
CREATE TABLE chunks (
    tenant_id text,
    chunk_start_time timestamp,
    chunk_end_time timestamp,
    data blob,
    index blob,
    PRIMARY KEY (tenant_id, chunk_start_time)
) WITH CLUSTERING ORDER BY (chunk_start_time DESC);

Note: The blob type is used for the chunk data and index. You might also consider using bytes.

When to Choose Which

  • DynamoDB: Ideal if you’re already in the AWS ecosystem, want a fully managed solution, and prefer to avoid operational overhead of managing database clusters. It scales well but can become expensive at very high ingest rates or if read/write capacity isn’t tuned correctly.
  • Cassandra: A strong choice for on-premises deployments or multi-cloud environments where you need complete control over your database infrastructure. It offers excellent write performance and scalability but requires significant operational expertise to manage and tune effectively.

Both backends require careful planning regarding schema, capacity (DynamoDB) or replication/compaction (Cassandra), and ongoing monitoring to ensure optimal performance and cost-efficiency for your Loki deployment.

Want structured learning?

Take the full Loki course →