NATS clustering isn’t about redundancy; it’s about distributed state.
Let’s see a NATS cluster in action. Imagine two NATS servers, nats-server-1 and nats-server-2, running on different machines, but acting as a single logical NATS system.
Here’s a basic configuration for nats-server-1 to join a cluster:
# nats-server-1.conf
server_name: nats-server-1
listen: 0.0.0.0:4222
cluster {
listen: 0.0.0.0:6222
routes [
nats://nats-server-2:6222
]
}
And for nats-server-2:
# nats-server-2.conf
server_name: nats-server-2
listen: 0.0.0.0:4222
cluster {
listen: 0.0.0.0:6222
routes [
nats://nats-server-1:6222
]
}
When nats-server-1 starts, it tries to connect to nats-server-2 on its cluster port (6222). If nats-server-2 is already running and listening on its cluster port, they establish a persistent connection. This connection is a "route." From nats-server-1’s perspective, nats-server-2 is now a known peer in the cluster. If nats-server-2 publishes a message to a subject that nats-server-1 is interested in (e.g., foo.>), nats-server-2 will forward that message to nats-server-1 over the established route.
The core problem NATS clustering solves is distributing message state. When clients connect to any server in the cluster, they can publish and subscribe to any subject. The servers then figure out how to deliver those messages to the appropriate subscribers, even if those subscribers are connected to different servers. This is achieved by having each server know about all other servers in the cluster and maintaining active routes between them.
The cluster block is where the magic happens.
listen: This is the port NATS server listens on for other NATS servers to connect to it for clustering purposes. It’s distinct from the clientlistenport (4222).routes: This is a list of explicit connections to other NATS servers that this server should establish and maintain. When server A is configured to route to server B, and server B is configured to route to server A, they form a bidirectional cluster link.
The server_name is crucial for identification. Each server in a cluster must have a unique server_name. This name is used in gossip protocols and for identifying server-specific state.
Let’s add a third server, nats-server-3, and have it connect to both nats-server-1 and nats-server-2.
nats-server-3.conf:
server_name: nats-server-3
listen: 0.0.0.0:4222
cluster {
listen: 0.0.0.0:6222
routes [
nats://nats-server-1:6222,
nats://nats-server-2:6222
]
}
Now, nats-server-3 will try to establish routes to both nats-server-1 and nats-server-2. If nats-server-1 and nats-server-2 are also configured to route to each other, you have a fully connected mesh. However, NATS uses a more sophisticated approach. It doesn’t strictly require every server to know about every other server directly. If nats-server-1 knows about nats-server-2, and nats-server-2 knows about nats-server-3, then nats-server-1 can still send messages to subscribers on nats-server-3 by routing through nats-server-2. This is the "gossip" aspect of NATS clustering. Servers periodically exchange information about known peers, allowing them to discover the entire cluster topology dynamically.
The routes configuration defines initial connections. Once connected, servers use a gossip protocol to discover other cluster members and maintain an up-to-date view of the cluster. If a server goes down, the remaining servers will detect the lost connection and update their topology. Publishers will then route messages through alternative paths if available.
The most surprising thing about NATS clustering is its active-active nature for message delivery. If you have a publisher connected to nats-server-1 and subscribers on both nats-server-1 and nats-server-2 for the same subject, nats-server-1 will deliver the message to its local subscribers, and it will also forward the message to nats-server-2 (via the cluster route) to deliver to its subscribers. There’s no single point of message brokering; delivery is distributed.
The internal mechanism for routing messages across cluster members involves tracking which server is best positioned to deliver a message to a given subject. When a server receives a message for a subject, it checks its local subscriptions. If none exist, it consults its cluster topology information. It then forwards the message to the peer server that has the most relevant subscriptions or is closest in the gossip network to the target subscribers. This is not a centralized registry; it’s a distributed decision-making process.
A common pitfall is neglecting the cluster port in firewall rules. If nats-server-1 cannot reach nats-server-2 on port 6222 (or whatever cluster.listen is configured to), the route will fail, and they won’t form a cluster. The routes array is essentially a list of desired persistent connections. If a server is listed in the routes of another server, it’s expected to be listening on its cluster.listen port.
The next concept you’ll want to explore is NATS JetStream, which builds upon this clustered foundation to provide durable message persistence and stream processing.