Kafka Streams is a powerful Java library for building event-driven applications and microservices. The branch and split operations are fundamental for directing records to different processing paths based on their content.

Let’s see branch in action. Imagine you have a stream of user events, and you want to route "login" events to one topic and "logout" events to another, while discarding everything else.

StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> events = builder.stream("user-events");

// Define the predicate for login events
Predicate<String, String> isLogin = (key, value) -> value.contains("login");

// Branch the stream
Map<String, KStream<String, String>> branchedStreams =
    events.branch(isLogin, (key, value) -> value.contains("logout"));

// Send login events to "login-topic"
branchedStreams.get(0).to("login-topic");

// Send logout events to "logout-topic"
branchedStreams.get(1).to("logout-topic");

// Everything else is discarded because we didn't define a default branch

In this example, events.branch(isLogin, (key, value) -> value.contains("logout")) creates two output streams. The first stream (index 0) contains records where isLogin is true. The second stream (index 1) contains records where isLogin is false AND the second predicate (key, value) -> value.contains("logout") is true. Records that don’t match any predicate are dropped.

The split operation is similar but allows for more explicit naming of the output streams. This can significantly improve readability and maintainability, especially when you have many branching conditions.

StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> orders = builder.stream("order-stream");

// Split the stream based on order status
Map<String, KStream<String, String>> splitOrders =
    orders.split(Serdes.String(), Serdes.String())
          .branch( (key, value) -> value.equals("PENDING"), "pending-orders")
          .branch( (key, value) -> value.equals("SHIPPED"), "shipped-orders")
          .defaultBranch( "other-orders"); // Optional: handle records not matching any branch

// Process each split stream
splitOrders.get("pending-orders").to("pending-orders-topic");
splitOrders.get("shipped-orders").to("shipped-orders-topic");
splitOrders.get("other-orders").foreach((key, value) -> {
    System.out.println("Unprocessed order: " + key + " - " + value);
});

Here, orders.split(...) initiates the splitting process. We then chain branch calls, each with a predicate and a unique name for the resulting stream. The defaultBranch is a convenient way to catch any records that don’t satisfy the preceding predicates.

Internally, Kafka Streams uses a Topology object to represent the processing graph. When you use branch or split, Kafka Streams constructs nodes in this graph. For branch, it creates a BranchedKStream which internally maps the predicates to indices. For split, it generates a KStream[] where each element corresponds to a named branch. These operations effectively create multiple "virtual" streams from a single input stream, allowing for parallel processing and routing.

The key difference is how you access the resulting streams: branch returns a KStream<K, V>[] where you access streams by their index (0, 1, 2…), while split returns a Map<String, KStream<K, V>> where you access streams by the names you provided.

A common misconception is that branch will process all records through every predicate until one matches. This is not true. A record is sent to the first predicate that evaluates to true. If no predicate matches, it’s discarded unless a defaultBranch is specified with split.

The branch and split operations are stateless. They don’t maintain any internal state between records. This makes them highly efficient for simple routing logic. If you need to perform stateful operations based on different conditions, you would typically branch first and then apply stateful operators like groupByKey, aggregate, or join on the resulting streams.

When using split with defaultBranch, you’re not actually creating a new topic; you’re creating a logical branch within your stream processing application. Any records sent to this default branch will be processed according to the code you attach to splitOrders.get("default-branch-name"). If you want to persist these "default" records, you’d explicitly use .to("some-fallback-topic") on that branch.

The true power emerges when you combine these branching strategies with other Kafka Streams DSL operations. For instance, you might branch a stream of user actions, then repartition each branch by userId to perform user-specific aggregations, and finally join the results of these aggregations.

The next logical step after routing records is to perform aggregations on those routed streams.

Want structured learning?

Take the full Kafka-streams course →