Kafka Streams uses an internal cache to optimize state store operations, and tuning it can significantly reduce the number of flushes to Kafka, improving performance.

Let’s see this in action. Imagine a simple word count application. Without tuning, Kafka Streams might write each intermediate count update back to Kafka. This is like writing down every single tally mark immediately. With caching, it holds onto those tallies for a bit, batching them up before writing.

Here’s a simplified view of a Kafka Streams topology with a state store:

StreamsBuilder builder = new StreamsBuilder();
KStream<String, Long> source = builder.stream("input-topic");

KTable<String, Long> counts = source
    .groupByKey()
    .count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>>as("counts-store")
        .withValueSerde(Serdes.Long())
        .withKeySerde(Serdes.String()));

counts.toStream().to("output-topic", Produced.with(Serdes.String(), Serdes.Long()));

KafkaStreams streams = new KafkaStreams(builder.build(), streamsConfig);
streams.start();

The counts-store is where the intermediate counts are stored. By default, Kafka Streams has a cache, but its size and how it behaves can be adjusted.

The core problem this solves is I/O bound processing. Every time a state store record is updated, if it’s not in the cache and the cache is full, Kafka Streams will flush some data to its backing changelog topic in Kafka. These flushes are network I/O and disk I/O on the Kafka brokers. If your application is processing a high volume of events, these flushes can become a bottleneck.

The main lever you have is the cache.max.bytesbuffering configuration. This setting controls the maximum amount of memory (in bytes) that Kafka Streams will use for buffering records in its internal cache before flushing them to the state store’s changelog topic.

Here’s how you’d set it in your StreamsConfig:

Properties streamsConfig = new Properties();
streamsConfig.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-streams-app");
streamsConfig.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
streamsConfig.put(StreamsConfig.STATE_DIR_CONFIG, "/path/to/state/dir"); // Crucial for state stores

// Tune the cache size
streamsConfig.put(StreamsConfig.CACHE_MAX_BUFFERING_CONFIG, "10485760"); // 10MB

A larger value for cache.max.bytesbuffering means Kafka Streams can hold more records in memory before it must flush. This reduces the frequency of writes to the changelog topics, which are often the bottleneck. However, setting it too high can lead to increased memory usage in your Kafka Streams application and potentially longer recovery times if the application restarts, as the cache needs to be replayed.

The default value for cache.max.bytesbuffering is 10MB. If you’re seeing high latency or your Kafka broker disks are saturated by the changelog topics, increasing this value is the first thing to try. For example, setting it to 20971520 (20MB) or 52428800 (50MB) might be beneficial. The optimal value depends heavily on your event volume, the size of your keys and values, and the available memory on your Streams nodes.

Another related, but less direct, tuning parameter is commit.interval.ms. This controls how often Kafka Streams commits its processing progress (and flushes its internal caches) to Kafka. A smaller commit interval means more frequent commits and flushes. While increasing commit.interval.ms can also reduce flush frequency, it increases the potential for data loss if an application fails between commits. It’s generally better to tune cache.max.bytesbuffering for I/O optimization and use commit.interval.ms for fault tolerance considerations. The default is 30 seconds.

When you increase cache.max.bytesbuffering, you are allowing the system to accumulate more intermediate state updates before forcing them to durable storage. This means that during a normal shutdown or a controlled restart, the state store will have to re-apply fewer individual records from its changelog, as more of the recent changes are already present in the cache and will be written out as larger, batched flushes. However, if the application crashes unexpectedly, the data that was only in the cache and not yet flushed to the changelog topic will be lost. This is why the cache is a performance optimization and not a durability mechanism.

The next thing you’ll likely run into is optimizing the underlying Kafka brokers themselves for changelog topics, or understanding how Kafka Streams handles rebalancing and state store restoration.

Want structured learning?

Take the full Kafka-streams course →