MirrorMaker 2 is Kafka’s native tool for replicating data between clusters, but its core mechanism is surprisingly different from what most people expect.
Let’s watch it in action. Imagine we have two Kafka clusters: source-cluster and target-cluster. Our goal is to replicate a topic named my-topic from source-cluster to target-cluster.
First, we need to configure MirrorMaker 2. This involves creating a properties file, let’s call it mm2.properties:
# Source cluster bootstrap servers
bootstrap.servers=source-broker1:9092,source-broker2:9092
# Target cluster bootstrap servers
target.cluster.bootstrap.servers=target-broker1:9092,target-broker2:9092
# Define the replication flow. Format: <source_alias>.replication.topic.<source_topic_name>=<target_topic_name>
# Here, we are replicating 'my-topic' from the source cluster to 'my-topic' on the target cluster.
# The alias 'source' is arbitrary, but it's good practice to be descriptive.
source.replication.topic.my-topic=my-topic
# Consumer group offset synchronization. This is crucial for stateful applications.
# It ensures that consumer groups on the target cluster can resume from where they left off on the source.
source.replication.group.id=my-mirror-group
# Internal topics prefix for MirrorMaker's own metadata.
# This helps segregate MirrorMaker's internal state from your application topics.
# Default is 'mm2', but you can change it if you have multiple MirrorMaker instances.
replication.internal.topic.replication.factor=3
# Enable offset syncing for consumer groups
emit.checkpoints.enabled=true
emit.checkpoints.interval.seconds=60
Now, we start MirrorMaker 2 with this configuration:
kafka-mirror-maker.sh --consumer-config mm2.properties --producer-config mm2.properties --clusters source --clusters target
What’s happening under the hood? MirrorMaker 2 doesn’t just read from the source and write to the target. It actually treats the target cluster as the primary source of truth for replication configuration and metadata. When MirrorMaker 2 starts, it connects to the target-cluster and uses its internal topics (like mm2-offset-syncs.target.internal and mm2-configs.target.internal) to discover what needs to be replicated and where to sync offsets.
The source.replication.topic.my-topic=my-topic line tells MirrorMaker 2 to monitor my-topic on the source-cluster. It then reads records from my-topic in the source and writes them to a topic named my-topic in the target cluster. The magic for consumer group synchronization happens via the emit.checkpoints.enabled=true setting. MirrorMaker 2 periodically reads consumer group offsets from the source-cluster and writes them to a special topic (mm2-offset-syncs.<target_alias>.internal) on the target-cluster. A separate process within MirrorMaker 2 then reads these sync messages and commits the corresponding offsets on the target cluster for the specified consumer groups.
This architecture means that the target-cluster effectively dictates the replication topology. If you want to replicate a new topic, you don’t reconfigure the source MirrorMaker; you update the configuration on the target cluster or through the MirrorMaker’s configuration API. The clusters argument in the command line is a bit of a misnomer; it registers these cluster aliases with MirrorMaker, but the actual interaction model is driven by the target.cluster.bootstrap.servers.
The most surprising part for many is how MirrorMaker 2 handles topic creation and configuration. It doesn’t just copy data; it actively manages topic configurations on the target cluster. When a topic is created or altered on the source, MirrorMaker 2 will detect this and attempt to create or update the corresponding topic on the target cluster, inheriting many of the source topic’s settings like partition counts and replication factors. This ensures that the replicated topic on the target is a faithful representation of the source, down to its structural properties.
The real power comes when you realize that MirrorMaker 2 can be configured to replicate multiple topics and even entire clusters by defining more source.replication.topic.<source_topic_name> properties or using wildcards. For instance, source.replication.topic.*=my-replicated-topics would replicate all topics from the source to a single topic named my-replicated-topics on the target, effectively merging them.
The next challenge you’ll likely face is managing the lifecycle of replicated topics, especially when dealing with schema evolution or complex topic configurations.