Kafka Streams on Kubernetes: Deploy and Scale Apps (2026)

Kafka Streams on Kubernetes is a powerful combination for building real-time data processing applications. The most surprising thing about it is how seamlessly Kafka Streams, a library, can be deployed and scaled within a Kubernetes environment, often appearing as if it’s a first-class citizen of the orchestrator.

Let’s see it in action. Imagine a simple Kafka Streams application that counts word frequencies from an input Kafka topic and writes the results to an output topic.

import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;

import java.util.Properties;

public class WordCountApp {

    public static void main(String[] args) {
        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application");
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker-service:9092"); // Kubernetes Service name
        props.put(StreamsConfig.DEFAULT_KEY_SERDE_CONFIG, Serdes.String().getClass());
        props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CONFIG, Serdes.String().getClass());

        StreamsBuilder builder = new StreamsBuilder();
        builder.stream("input-topic")
               .flatMapValues(value -> List.of(value.toLowerCase().split("\\W+")))
               .groupBy((key, value) -> value)
               .count()
               .toStream()
               .to("output-topic", Produced.with(Serdes.String(), Serdes.Long()));

        Topology topology = builder.build();
        KafkaStreams streams = new KafkaStreams(topology, props);
        streams.start();

        // Add shutdown hook for graceful termination
        Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
    }
}

This application is then packaged into a Docker image and deployed using Kubernetes Deployments. A typical Kubernetes Deployment for this would look something like:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: wordcount-app-deployment
  labels:
    app: wordcount-app
spec:
  replicas: 3 # Start with 3 instances
  selector:
    matchLabels:
      app: wordcount-app
  template:
    metadata:
      labels:
        app: wordcount-app
    spec:
      containers:
      - name: wordcount-app-container
        image: your-docker-repo/wordcount-app:latest
        ports:
        - containerPort: 8080 # If your app exposes metrics or health checks
        env:
        - name: KAFKA_BROKERS
          value: "kafka-broker-service:9092" # Inject Kafka broker address

And a Kubernetes Service to expose it:

apiVersion: v1
kind: Service
metadata:
  name: wordcount-app-service
spec:
  selector:
    app: wordcount-app
  ports:
  - protocol: TCP
    port: 8080
    targetPort: 8080

The problem this solves is the distributed nature of real-time data processing. Kafka Streams provides the stream processing logic, and Kubernetes handles the orchestration, scaling, and resilience of these processing instances. When you set replicas: 3 in the Deployment, Kubernetes ensures that three instances of your WordCountApp are running. Kafka Streams, by its very design, is partition-aware. When you have multiple instances of the same APPLICATION_ID_CONFIG, Kafka Streams automatically distributes the Kafka topic partitions among these instances. Each instance will process a subset of the partitions, ensuring that the overall processing workload is divided.

The internal mechanism at play here is Kafka Streams’ consumer group management combined with Kubernetes’ pod management. Each Kafka Streams application instance acts as a client in a Kafka consumer group (defined by APPLICATION_ID_CONFIG). Kubernetes ensures that these client instances are running and healthy. When you scale the Deployment (e.g., change replicas to 5), Kubernetes starts more pods. Kafka, seeing new clients join the consumer group, automatically rebalances the partition assignments. Some existing instances will relinquish partitions, and the new instances will take over processing them. This dynamic rebalancing is what enables automatic scaling of your Kafka Streams application.

The exact levers you control are primarily within your KafkaStreams configuration and your Kubernetes Deployment. APPLICATION_ID_CONFIG is crucial for consumer group coordination. BOOTSTRAP_SERVERS_CONFIG points to your Kafka cluster, which can be a Kubernetes Service name if Kafka is also deployed on Kubernetes. In the Kubernetes Deployment, replicas dictates the number of processing instances. You can also leverage Kubernetes features like Horizontal Pod Autoscaler (HPA) to automatically adjust the number of replicas based on metrics like CPU or memory utilization, or custom metrics related to Kafka lag.

One of the most powerful, yet often overlooked, aspects of Kafka Streams on Kubernetes is its state management and fault tolerance. Kafka Streams applications can maintain local state stores (e.g., for aggregations like count()). When a Kafka Streams instance fails, Kubernetes will restart the pod. Upon restart, the Kafka Streams application re-initializes, and crucially, it can restore its state from Kafka’s changelog topics. These changelog topics are Kafka topics that Kafka Streams uses to durably store state changes. This means that even after a pod crash and restart, your application can resume processing from where it left off, with its state intact, thanks to the combination of Kafka’s durability and Kubernetes’ orchestration.

The next concept to explore is how to effectively monitor your Kafka Streams applications running in Kubernetes, particularly focusing on Kafka consumer lag.