Kafka Streams’ stream-table duality is the idea that a stream of records can be thought of as changes to an underlying table, and a table can be materialized as a stream of changes.

Let’s see it in action. Imagine we have a Kafka topic orders where each message represents a new order with order_id and amount.

{"order_id": "A123", "amount": 100}
{"order_id": "B456", "amount": 250}
{"order_id": "A123", "amount": 120} // Update for order A123

In Kafka Streams, we can read this orders topic as a stream. But we can also represent the current state of all orders as a table, where order_id is the key and the latest amount is the value.

Here’s a Kafka Streams application that models this:

// Define the Serdes for keys and values
Serde<String> keySerde = Serdes.String();
Serde<Long> valueSerde = Serdes.Long(); // Assuming amount is a long

// Build the Streams application
StreamsBuilder builder = new StreamsBuilder();

KStream<String, Long> ordersStream = builder.stream(
    "orders",
    Consumed.with(keySerde, valueSerde)
);

// Materialize the stream as a KTable
KTable<String, Long> ordersTable = ordersStream.toTable();

// Now, ordersTable represents the latest amount for each order_id
// We can perform operations on this table, like aggregating
KTable<String, Long> totalAmountPerOrder = ordersTable.groupByKey().reduce(
    (newAmount, oldAmount) -> newAmount + oldAmount, // Aggregation logic
    Materialized.as("total-amount-store")
);

// This totalAmountPerOrder KTable itself can be viewed as a stream of updates
KStream<String, Long> totalAmountUpdates = totalAmountPerOrder.toStream();

// Let's print the results to a new topic for demonstration
totalAmountUpdates.to("order-totals", Produced.with(keySerde, valueSerde));

// Build and start the Kafka Streams application
KafkaStreams streams = new KafkaStreams(builder.build(), streamsConfig);
streams.start();

This code first defines how to serialize and deserialize our keys (order_id) and values (amount). Then, it reads the orders topic into a KStream. The crucial step is ordersStream.toTable(). This operation takes the stream of incoming order updates and builds a KTable where each order_id maps to its latest amount. If an order_id appears multiple times, the KTable only holds the most recent value for that key.

Think of the KTable as a state store that’s continuously updated by the incoming stream. When a new record arrives for an existing key, the KTable is updated in place. When a record arrives for a new key, it’s added to the KTable.

We can then perform stateful operations on this KTable. In the example, groupByKey().reduce() aggregates the amounts for each order_id. This reduce operation is defined by a reducer function ((newAmount, oldAmount) -> newAmount + oldAmount) and is stored in a changelog topic (implicitly, via Materialized.as). The result, totalAmountPerOrder, is another KTable representing the cumulative total amount for each order.

The duality means that this totalAmountPerOrder KTable can also be viewed as a stream of changes. If an order’s total amount changes (because a new order update came in), a new record representing this change will be emitted to the order-totals topic. This is achieved by totalAmountPerOrder.toStream().

The core problem this solves is managing state in a distributed, fault-tolerant way. By treating data as a stream of changes to an underlying table, Kafka Streams can efficiently process and aggregate data, even as it arrives continuously. The KTable provides an abstraction over the current state, while the KStream represents the history of changes that led to that state.

When you perform a groupByKey().aggregate() or groupByKey().reduce() on a KTable, Kafka Streams internally uses a state store (like RocksDB) to maintain the aggregation results. This state store is backed by a Kafka changelog topic. Every time the aggregated result for a key changes, a new record is written to this changelog topic. This ensures that the state is durable and can be restored if a Streams instance fails. The KTable itself is essentially a view over this changelog topic, always showing the latest computed state.

The toTable() operation on a KStream doesn’t just create a snapshot; it creates a dynamic view that’s updated in real-time. The key insight is that Kafka topics are inherently ordered logs, and a KTable is a materialized view of the latest record for each key in that log.

Want structured learning?

Take the full Kafka-streams course →