Kafka Streams joins are an incredibly powerful tool for stateful stream processing, but the distinction between stream-stream and stream-table joins can be a major point of confusion.
Let’s look at a stream-stream join in action. Imagine we have two Kafka topics: user-clicks and user-page-views. Both topics have a user_id as the key and some event data as the value. We want to join these two streams to see which users clicked on which pages.
StreamsBuilder builder = new StreamsBuilder();
KStream<String, ClickEvent> clicks = builder.stream("user-clicks", Consumed.with(Serdes.String(), jsonSerde));
KStream<String, PageViewEvent> pageViews = builder.stream("user-page-views", Consumed.with(Serdes.String(), jsonSerde));
KStream<String, JoinedEvent> joinedStream = clicks.join(
pageViews,
(click, pageView) -> new JoinedEvent(click.getUserId(), click.getTimestamp(), pageView.getUrl(), pageView.getTimestamp()),
JoinWindows.ofTimeDifferenceAndGrace(Duration.ofMinutes(5), Duration.ofMinutes(1)), // Windowing
StreamJoined.with(Serdes.String(), jsonSerde, jsonSerde) // Serdes
);
joinedStream.to("user-activity-log", Produced.with(Serdes.String(), jsonSerde));
In this stream-stream join, clicks and pageViews are treated as infinite, unbounded streams. When a new event arrives on either stream, Kafka Streams looks for a matching event on the other stream within the defined JoinWindows. The JoinWindows specifies a time boundary within which matching events are considered. If a click event arrives at T1 and a page view event for the same user arrives at T2, and both T1 and T2 fall within the 5-minute window (plus a 1-minute grace period), they are joined. If a page view arrives at T3 but no corresponding click arrived within the window, it’s dropped.
Now, let’s consider a stream-table join. Here, we have our user-clicks stream and a user-profile table. This "table" is represented by a Kafka topic that acts as a changelog, where each key (e.g., user_id) has a current value (the profile), and updates overwrite previous values.
StreamsBuilder builder = new StreamsBuilder();
KStream<String, ClickEvent> clicks = builder.stream("user-clicks", Consumed.with(Serdes.String(), jsonSerde));
KTable<String, UserProfile> userProfiles = builder.table("user-profiles", Consumed.with(Serdes.String(), jsonSerde));
KStream<String, ClickWithProfile> clicksWithProfile = clicks.leftJoin(
userProfiles,
(click) -> new ClickWithProfile(click.getUserId(), click.getTimestamp(), "Unknown"), // Value if profile not found
(click, profile) -> new ClickWithProfile(click.getUserId(), click.getTimestamp(), profile.getCountry())
);
clicksWithProfile.to("user-clicks-enriched", Produced.with(Serdes.String(), jsonSerde));
In a stream-table join, the stream (clicks) is processed event by event. When a click event arrives, Kafka Streams looks up the current state of the corresponding key in the userProfiles table. The table is essentially a materialized view of the latest values for each key. If a user profile exists for that user_id, the click event is joined with the current profile. If the profile is updated later, that update doesn’t retroactively affect past join operations. The join happens based on the state of the table at the time the stream event is processed. This is why it’s often called a "lookup" join.
The most surprising true thing about Kafka Streams joins is that they don’t actually "join" data in the traditional database sense of waiting for all related records to arrive. Instead, they are time-bound or state-dependent operations that produce new events based on what’s available now or within a specific window.
The core problem Kafka Streams joins solve is enriching or correlating data from different sources in a continuous, real-time fashion. Instead of batching data to a data warehouse and performing joins there, you can do it as events flow through Kafka.
Internally, Kafka Streams uses RocksDB (by default) to maintain the state required for joins. For stream-stream joins, it maintains windows of incoming events. For stream-table joins, it maintains the latest version of each record in the table. The JoinWindows in stream-stream joins define how long events remain eligible for joining. For stream-table joins, the "window" is effectively the current state of the table.
A critical, often overlooked, aspect of stream-table joins is how updates to the table are handled. When a KTable record is updated, it’s treated as a new, complete record for that key. The join operation then processes the stream event against this new version of the table record. This means if you have a click event that arrives after a user profile update, it will be joined with the updated profile, not the old one. However, if the click event arrived before the profile update, it will be joined with the old profile, and that join result is final for that click event.
The next concept you’ll likely encounter is handling different join semantics, such as outer joins (left, right, and full), and understanding how they interact with windowing and table updates.