Kafka Tiered Storage is surprisingly effective at making your Kafka clusters appear to have infinite retention, even when the vast majority of your data lives on cheap object storage like S3.
Let’s watch it in action. Imagine you have a Kafka topic, user_events, with a retention policy that’s a bit aggressive for your hot storage: 3 days. But you need to keep event data for a year for compliance.
First, you’ll need a Kafka cluster that supports Tiered Storage. This usually means a recent version of Kafka and specific configurations enabled. You’ll also need an S3 bucket configured with appropriate permissions for your Kafka brokers.
Here’s a simplified look at what a broker might be doing. A producer sends a message to partition 0 of topic user_events.
{
"topic": "user_events",
"partition": 0,
"offset": 1234567,
"timestamp": 1678886400000,
"value": {"user_id": "abcde", "event": "login", "timestamp": "2023-03-15T10:00:00Z"}
}
This message is written to the active segment file on the broker’s local disk. Over time, as the segment fills up or a time limit is reached, it becomes "closed."
Now, Tiered Storage kicks in. A background process on the broker (or a separate agent, depending on the implementation) identifies these closed segments that are older than a certain threshold (e.g., 24 hours). It then uploads these segments to S3.
Here’s what an uploaded segment might look like in S3. The naming convention often includes topic, partition, and the segment’s base offset.
s3://your-kafka-bucket/user_events/0/000000000000000100000.log
s3://your-kafka-bucket/user_events/0/000000000000000100000.index
s3://your-kafka-bucket/user_events/0/000000000000000100000.timeindex
Once uploaded, the broker can delete the segment from its local disk, freeing up valuable hot storage. However, it still maintains metadata about these segments, including their location in S3.
When a consumer group requests data from an offset that’s no longer on local disk, but has been offloaded to S3, the broker intercepts this. It looks up the metadata, finds the corresponding segment file in S3, downloads it (or a relevant portion of it), and serves the data to the consumer. This download happens on-demand.
The key problem this solves is the exponential cost growth of keeping years of data on fast, expensive local SSDs or HDDs. By moving older, less frequently accessed data to S3, you decouple storage cost from retention duration.
The internal mechanism involves a "remote storage manager" component within Kafka. This manager is responsible for:
- Uploading: Periodically scanning local segments, identifying eligible ones, and uploading them to the configured object storage endpoint.
- Metadata Management: Keeping track of which segments are offloaded, their S3 locations, and their offset ranges.
- Downloading/Serving: When a consumer requests data from an offloaded segment, this manager retrieves the necessary data from S3 and makes it available to the consumer.
The exact levers you control are primarily related to the log.segment.bytes, log.retention.hours (for local retention), and new configurations specific to tiered storage, such as log.remote.storage.enable, log.remote.storage.s3.bucket, and log.remote.storage.s3.region. You also configure the "prefix" for your S3 bucket and potentially the time thresholds for when segments become eligible for offloading.
A critical, yet often overlooked, aspect of Tiered Storage is the impact on read latency for older data. While Kafka efficiently manages the download, fetching data from S3 will inherently be slower than reading from local disk. This means that for workloads with frequent access to very old data, the performance characteristics can change significantly. The system isn’t just a transparent pass-through; it’s a tiered system where the cost-performance trade-off is explicit.
If your offloaded segments aren’t being uploaded, you’ll likely encounter errors related to consumers failing to fetch data they expect to be available.