A KTable is not a materialized view of a Kafka topic; it’s a changelog stream that represents the latest known value for each key.

Let’s see this in action. Imagine we have a Kafka topic user-topics with messages like this:

{"user_id": "user1", "topic": "sports"}
{"user_id": "user2", "topic": "news"}
{"user_id": "user1", "topic": "finance"}

When we create a KTable from user-topics, it doesn’t store all these historical updates as separate entities. Instead, it maintains the current state for each user_id. So, after processing these messages, our KTable effectively looks like this:

user1 -> finance
user2 -> news

The previous update for user1 ("sports") is gone, replaced by the latest one ("finance"). This is the core behavior: a KTable is a stream of updates, and the Kafka Streams library processes these updates to maintain the latest state internally.

Now, what about GlobalKTable? A GlobalKTable is a materialized view. It reads all the data from a topic and builds a local, queryable store of that data, replicated on every Kafka Streams instance. Unlike a KTable which is partitioned and processed by specific instances, a GlobalKTable is available everywhere.

Here’s the fundamental difference:

  • KTable: Partitioned. Each Kafka Streams instance only processes and holds the state for the partitions assigned to it. You can join a KTable with another KTable or GlobalKTable, but the join operation happens on a per-partition basis. This is efficient for large datasets because you don’t need to move all data to every instance.
  • GlobalKTable: Non-partitioned (conceptually). Every instance of your Kafka Streams application will have a complete copy of the data. This allows for efficient lookups (like a traditional database join) without requiring repartitioning.

When would you choose one over the other?

Use KTable when:

  • Your source topic is large, and you don’t need every record on every instance.
  • You are performing joins where the join keys are also partitioned in a way that makes sense for co-location. For example, joining user-topics with user-profiles where both are keyed by user_id and partitioned identically.
  • You want to aggregate data (e.g., count the number of messages per user).

Use GlobalKTable when:

  • You need to perform a lookup on a relatively small dataset that needs to be accessible from any instance of your application. A classic example is a reference data table (e.g., product catalog, country codes).
  • You want to join a stream (a KStream or KTable) with a dataset where the join key is not the same as the partitioning key of the stream, or where the reference data is too small to justify partitioning.

Let’s illustrate a common GlobalKTable use case: enriching a stream of events with static data.

Suppose you have a KStream of purchase-events keyed by product_id, and you want to add the product_name from a GlobalKTable derived from a product-catalog topic.

// Assume streamsConfig is properly set up
StreamsBuilder builder = new StreamsBuilder();

// 1. Create a GlobalKTable from the product catalog topic
// The topic is keyed by product_id, and we want to query by product_id
GlobalKTable<String, Product> productCatalogTable = builder.globalTable("product-catalog",
    Consumed.with(Serdes.String(), productSerde), // Assuming productSerde serializes Product objects
    Materialized.as("product-catalog-store")); // This name is for the internal state store

// 2. Create a KStream of purchase events
KStream<String, Purchase> purchaseStream = builder.stream("purchase-events",
    Consumed.with(Serdes.String(), purchaseSerde)); // Assuming purchaseSerde serializes Purchase objects

// 3. Perform a join
// The join is done based on the key of the purchaseStream (product_id)
// and the key of the GlobalKTable (product_id)
KStream<String, EnrichedPurchase> enrichedPurchases = purchaseStream.join(
    productCatalogTable,
    (purchase, product) -> new EnrichedPurchase(purchase, product), // Value joiner
    Joined.with(Serdes.String(), purchaseSerde, productSerde) // Serdes for join
);

// ... further processing of enrichedPurchases ...

KafkaStreams streams = new KafkaStreams(builder.build(), streamsConfig);
streams.start();

In this example:

  • "product-catalog" is a Kafka topic.
  • globalTable("product-catalog", ...) tells Kafka Streams to read all partitions of this topic and build a local, fully replicated key-value store (named "product-catalog-store") on every instance of the application.
  • purchaseStream.join(productCatalogTable, ...) uses the product_id from each Purchase record in the purchaseStream to look up the corresponding Product record in the productCatalogTable’s local store. This lookup is fast because the entire catalog is available on the current instance.

The most surprising thing about GlobalKTable is that while it’s conceptually "global" and fully replicated, it’s still backed by Kafka topics. The Kafka Streams library subscribes to all partitions of the source topic for the GlobalKTable and maintains its own local state store. When you define Materialized.as("product-catalog-store"), you’re essentially telling Kafka Streams to manage a RocksDB (or other configured store) instance on each application instance that holds a complete copy of the data from the "product-catalog" topic.

The key to choosing between KTable and GlobalKTable is understanding the data volume and the required access pattern. If you need to look up individual records from a static or slowly changing dataset across your entire stream processing application, GlobalKTable is your tool. If your data is large and you’re performing joins that can be partitioned effectively, stick with KTable.

The next challenge you’ll likely face is managing the state stores for GlobalKTables, especially regarding memory and disk usage on each application instance.

Want structured learning?

Take the full Kafka-streams course →