Snappy compression is often the default for Loki chunks, but Gzip can offer significantly better compression ratios at the cost of CPU.
Let’s see Snappy in action. Imagine we have a series of log lines, each with a timestamp and a message.
1678886400000000000 info Received request from 192.168.1.10
1678886401000000000 debug Processing request ID 12345
1678886402000000000 info Request ID 12345 completed successfully
1678886403000000000 warn High CPU usage detected on instance web-01
1678886404000000000 error Database connection failed: timeout
When Loki ingests these logs, it groups them into "chunks." Before writing these chunks to object storage, it compresses them. Snappy, a fast compression algorithm, is often the default. It prioritizes speed, making ingestion and retrieval quick.
Here’s a simplified view of how Snappy might compress our example log lines. It looks for repeating sequences and replaces them with shorter references. For instance, "167888640" is a common prefix. Snappy would represent this prefix once and then indicate how many times it repeats or how many bytes to skip. The actual log messages, like "Received request from 192.168.1.10" or "Database connection failed: timeout," would also be scanned for internal repetitions.
Conversely, Gzip employs a more complex algorithm (DEFLATE) that involves Huffman coding and LZ77. It looks for longer, more complex patterns and can achieve much higher compression ratios. However, this comes at a computational price: Gzip is significantly slower and uses more CPU, both for compression during ingestion and decompression during querying.
The choice between Snappy and Gzip in Loki’s configuration hinges on your priorities. If your primary concern is ingestion throughput and low query latency, and you have ample CPU resources, Snappy is usually the way to go. Loki’s default chunk_encoding is typically snappy.
# Example loki config excerpt
ingester:
chunk_encoding: snappy
If storage costs are a major driver, or your network bandwidth to object storage is limited, and you can tolerate higher CPU usage and potentially slightly slower queries, Gzip might be a better fit. To enable Gzip, you would change the configuration like this:
# Example loki config excerpt
ingester:
chunk_encoding: gzip
The actual on-disk representation of a chunk encoded with Snappy will be larger than one encoded with Gzip for the same set of logs. This difference in size directly impacts your storage bill and the amount of data transferred from your object storage.
When you query Loki, the chunks are fetched and decompressed. If you used Snappy, decompression is very fast, contributing to low query latency. If you used Gzip, the query engine spends more time decompressing the data before it can be processed, which can increase query latency, especially if the data is CPU-bound.
The one thing most people don’t realize is that the chunk_encoding setting affects both ingestion and query performance and storage costs simultaneously. It’s not just about how much space the data takes up; it’s also about the CPU cycles required to make it that small and then unmake it for reading. A common misconception is that "faster compression means less CPU," but Snappy is fast because it does less work, which is why its compression ratio is lower.
The next logical step after tuning your compression is exploring how Loki’s index_downsampling impacts query performance and storage.