Kafka Streams scaling concurrency is fundamentally about how many threads your application uses to process records from Kafka topics.

Let’s see it in action. Imagine a simple Kafka Streams app that counts word occurrences in a stream.

import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;

import java.util.Arrays;
import java.util.Properties;

public class WordCountApp {

    public static void main(String[] args) {
        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-app");
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        // This is the key setting for concurrency!
        props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 2);

        StreamsBuilder builder = new StreamsBuilder();
        KStream<String, String> textLines = builder.stream("text-lines");

        KStream<String, Long> wordCounts = textLines
            .flatMapValues(line -> Arrays.asList(line.toLowerCase().split("\\W+")))
            .groupBy((key, word) -> word)
            .count()
            .toStream();

        wordCounts.to("word-counts", org.apache.kafka.streams.kstream.Produced.with(Serdes.String(), Serdes.Long()));

        KafkaStreams streams = new KafkaStreams(builder.build(), props);
        streams.start();

        // Add shutdown hook for graceful termination
        Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
    }
}

In this example, StreamsConfig.NUM_STREAM_THREADS_CONFIG is set to 2. This means Kafka Streams will launch two threads to process records from the input topic text-lines. Each thread will independently pull data, apply the flatMapValues, groupBy, and count operations, and push results to the word-counts topic. The magic here is that Kafka Streams automatically handles partitioning. If text-lines has 10 partitions, and you have 2 threads, each thread will be assigned a subset of those partitions to process. If you increase NUM_STREAM_THREADS_CONFIG to 4, Kafka Streams will try to assign 2-3 partitions per thread, effectively parallelizing the processing further.

The core problem Kafka Streams concurrency solves is maximizing throughput by utilizing available CPU cores and Kafka’s partitioning model. A Kafka topic is divided into partitions. Each partition is ordered and processed by only one consumer instance within a consumer group. Kafka Streams leverages this by creating threads that act as consumers. By increasing the number of threads, you’re essentially telling Kafka Streams to spin up more internal consumers, each dedicated to a subset of the topic’s partitions. This allows for parallel execution of your stream processing logic across multiple CPU cores.

The StreamsConfig.NUM_STREAM_THREADS_CONFIG is your primary lever. Setting this to N means your application will have N threads processing data. Kafka Streams then manages the assignment of partitions to these threads. A good starting point is often the number of CPU cores available on the machine running the Kafka Streams application. However, the optimal number can be higher if your processing logic is I/O bound (e.g., making external API calls) or if you have multiple Kafka topics to process concurrently. The maximum useful number of threads is generally limited by the number of partitions in your input topics. If you have 10 partitions and set NUM_STREAM_THREADS_CONFIG to 20, only 10 of those threads will actually be doing work at any given time, with each thread handling one partition. The remaining 10 threads will be idle, waiting for new partitions to be assigned (which won’t happen if the total number of partitions is already consumed).

When you increase NUM_STREAM_THREADS_CONFIG, Kafka Streams rebalances partition assignments across the available threads. If you had 4 partitions and 2 threads, each thread might get 2 partitions. If you increase threads to 4, each thread will get 1 partition. This rebalancing ensures that work is distributed. If a thread fails, its assigned partitions are reassigned to other active threads, maintaining processing continuity.

The number of threads you configure is directly tied to how many partitions of your input topics your application instance can process in parallel. If you have an input topic with 6 partitions and set NUM_STREAM_THREADS_CONFIG to 3, each thread will process 2 partitions. If you set it to 6, each thread will process 1 partition. If you set it to 10, 6 threads will process one partition each, and 4 threads will be idle. The system distributes partitions among the threads, ensuring that no single partition is processed by more than one thread within the same application instance.

The next step in scaling beyond single-instance thread count is running multiple instances of your Kafka Streams application, each as a separate process, potentially on different machines.

Want structured learning?

Take the full Kafka-streams course →