Kafka Streams vs. Flink: Which Stream Processor Wins?

Kafka Streams and Flink are both powerful stream processing frameworks, but their fundamental design philosophies lead to vastly different operational characteristics and performance profiles.

Let’s see Kafka Streams in action. Imagine you have a Kafka topic named orders containing JSON messages like {"order_id": 123, "amount": 100.50, "timestamp": 1678886400000}. You want to calculate the total amount for each order ID.

Here’s a simplified Kafka Streams application in Java:

import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.KTable;

import java.util.Properties;

public class OrderAggregator {

    public static void main(String[] args) {
        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "order-aggregator");
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, "org.apache.kafka.common.serialization.Serdes$StringSerde");
        props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, "org.apache.kafka.common.serialization.Serdes$StringSerde");

        StreamsBuilder builder = new StreamsBuilder();

        // Read from the 'orders' topic
        KStream<String, String> ordersStream = builder.stream("orders");

        // Assume order_id is the key for aggregation
        KTable<String, Double> aggregatedAmounts = ordersStream
            .mapValues(value -> {
                // Basic JSON parsing (in a real app, use a proper parser)
                String[] parts = value.split(",");
                String amountStr = parts[1].split(":")[1]; // e.g., "100.50"
                return Double.parseDouble(amountStr.replace("}", ""));
            })
            .groupByKey()
            .reduce(Double::sum); // Sum the amounts for each key

        // Write the result to a 'total-amounts' topic
        aggregatedAmounts.toStream().to("total-amounts");

        KafkaStreams streams = new KafkaStreams(builder.build(), props);
        streams.start();

        // Graceful shutdown
        Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
    }
}

When this application runs, it subscribes to the orders topic. For each incoming order, it extracts the order ID and amount. It then aggregates these amounts by order ID, maintaining a running sum. The result, a table mapping each order ID to its total amount, is written to the total-amounts topic. This happens continuously as new orders arrive.

The core problem Kafka Streams solves is enabling stateful stream processing directly within your Kafka ecosystem. It leverages Kafka itself for fault tolerance and scalability. Your processing logic runs as standard Java applications (or other JVM languages) that interact with Kafka topics as input and output streams. This tight integration means you don’t need a separate cluster for your stream processing engine; your Kafka brokers handle the heavy lifting of data distribution and fault tolerance.

Internally, Kafka Streams uses Kafka’s consumer groups to distribute partitions of input topics across instances of your application. Each instance maintains its own local state store (RocksDB by default) for aggregations, joins, and windowing. When a Kafka partition is reassigned to a different instance (e.g., due to scaling or failure), the new instance can restore its state from the corresponding Kafka changelog topic, ensuring exactly-once processing semantics. The key levers you control are the APPLICATION_ID_CONFIG (which defines the consumer group), Kafka topic configurations (partitions, replication factor), and the processing topology defined in your StreamsBuilder.

Kafka Streams embeds its state management directly into the Kafka consumer. When you perform a stateful operation like a reduce or aggregate, Kafka Streams writes the intermediate state changes to a changelog topic. This changelog topic is essential for fault tolerance. If an application instance crashes, another instance can take over processing its partitions and restore the exact same state by replaying the changelog topic. This is how it achieves durable state without needing an external state store for basic fault tolerance. However, it’s crucial to understand that the changelog topic is also a Kafka topic, meaning its retention policies and partition counts directly influence the durability and scalability of your state.

The most surprising outcome of Kafka Streams’ design is how its state stores are managed. While it uses local RocksDB instances for performance, the durability of that state isn’t solely on the local disk. Kafka Streams achieves fault tolerance by writing all state changes to an internal Kafka changelog topic. This means that even if a processing instance’s local disk is wiped, its state can be fully reconstructed by reading from its designated changelog topic on another instance. The number of partitions in this changelog topic directly impacts how many instances can concurrently process a given stream, and its replication factor dictates the resilience of your state against broker failures.

The next step you’ll likely explore is how to perform more complex operations like joins between different streams or how to manage event time processing with sophisticated windowing strategies.