Loki’s schema version determines how your logs are stored and indexed, and migrating to a newer version can unlock performance improvements and new features, but it’s a process that requires careful planning to avoid losing any log data.
Let’s see Loki’s schema in action. Imagine you have a simple log entry coming in:
{
"level": "info",
"message": "User logged in",
"user_id": "abc-123"
}
When Loki ingests this, the schema version dictates how it’s broken down. In an older schema, say v1, this might be stored as a single blob with some basic metadata. In a newer schema, like v11 (which is a common target for upgrades), Loki might extract level, message, and user_id as distinct indexed labels, allowing for much faster and more granular querying. The upgrade process is essentially moving from the old storage format and indexing strategy to the new one.
The core problem Loki solves is making vast amounts of unstructured log data queryable at scale. It achieves this by separating the indexing of log metadata (like labels) from the actual log content. When you query Loki, it first uses the index to find the relevant chunks of log data and then streams those chunks to you. Each schema version represents a different trade-off in how this separation is managed, impacting indexing speed, storage efficiency, query performance, and the types of metadata you can effectively index.
Here’s a breakdown of how it works internally, focusing on the upgrade path:
- Index vs. Chunks: Loki stores data in "chunks." These chunks are typically compressed and immutable. Associated with these chunks is an "index." The index maps label sets to the chunks that contain logs matching those label sets.
- Schema’s Role: The schema version dictates:
- Index Type: What kind of index Loki uses (e.g., bolted, Prometheus-style, or more advanced implementations).
- Cardinality Management: How it handles labels with many unique values (high cardinality).
- Chunk Structure: How data is organized within the chunks.
- Metadata Extraction: Which parts of the log line are treated as indexed labels versus raw content.
- The Upgrade Challenge: When you upgrade Loki, you’re changing the rules for how new data is indexed and stored. Existing data, however, remains in its old format. To query old data with the new schema, Loki needs to be able to "translate" or re-index it. This is where the migration process comes in.
Let’s consider the configuration:
# loki.yaml
schema:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: s3
schema: v11
index:
prefix: index_
period: 24h
This configuration specifies that Loki should use schema v11 for data starting from 2020-10-24. Before this date, it might have used v1. The store and object_store define where the index and chunks are persisted.
The upgrade process typically involves these steps:
- Prepare the New Schema Configuration: You’ll update your
loki.yamlto point to the new schema version for future data. - Run the Migration Tool: Loki provides a
migratortool. This tool reads the old index, processes it, and writes a new index compatible with the target schema version, referencing the existing chunks. It doesn’t rewrite the chunk data itself, which is crucial for efficiency. - Restart Loki: Once the migration is complete, you restart Loki with the updated configuration. It will then use the new index to query both newly ingested data and the historically migrated data.
The migrator tool is run against your existing index and object store. A typical command might look like this:
./loki-migrator --config.file loki.yaml --target-schema-version v11 --parallelism 4 --from-schema-version v1
Here, --target-schema-version v11 is what you’re aiming for, --from-schema-version v1 is the starting point (though often the migrator can infer this), and --parallelism 4 helps speed up the process by using multiple goroutines.
The most surprising thing about Loki’s schema migration is that the data itself (the log chunks) is not rewritten. The migrator’s job is solely to rebuild the index. It reads the old index, understands which labels mapped to which chunks, and then constructs a new index that uses the new schema’s format to point to those same, unchanged chunks. This is why the process is feasible for large datasets; rewriting terabytes or petabytes of log data would be impractical. The migrator essentially creates a new "map" to your existing "territory."
After successfully migrating to a newer schema version, the next challenge you’ll likely encounter is optimizing query performance, especially for high-cardinality labels, which might lead you to explore advanced indexing strategies or query patterns.