NATS clustering is more than just a way to get high availability; it’s a distributed system where each server is a potential single point of failure for its local clients but not for the cluster as a whole, and your job is to make sure that "potential" never becomes "actual."

Let’s see how this plays out with a simple two-server cluster, nats-server-1 and nats-server-2.

Here’s a nats-server config file for nats-server-1:

listen: 4222
routes:
  - nats://192.168.1.101:4222 # Address of nats-server-2

And for nats-server-2:

listen: 4222
routes:
  - nats://192.168.1.100:4222 # Address of nats-server-1

When these servers start, they immediately try to establish connections to each other via their routes. A "route" in NATS is a persistent, bidirectional TCP connection between two NATS servers. This connection is how messages are forwarded between servers in a cluster.

If nats-server-1 is running and nats-server-2 tries to connect to it, nats-server-2 will initiate the connection. Once established, both servers can send and receive messages through this single TCP connection. This isn’t just for message forwarding; it’s also how servers discover other servers and maintain cluster health. If a route breaks, the servers will attempt to re-establish it.

Now, let’s introduce NATS Gateways. Gateways are specialized servers that bridge different NATS clusters, often across network boundaries (like different VPCs or data centers). They allow clients in one cluster to communicate with clients in another.

Imagine we have two clusters: cluster-A (servers a1, a2) and cluster-B (servers b1, b2). We want clients connected to cluster-A to be able to publish messages that cluster-B clients can subscribe to, and vice-versa.

Here’s a simplified config for a gateway server gw-a in cluster-A that connects to cluster-B:

port: 4222 # Standard NATS port for clients
gateway {
  name: "cluster-A"
  listen: 6222 # Port for gateway-to-gateway communication
  gateways {
    # Explicitly define the gateway server in cluster-B
    - host: "192.168.2.100" # IP of gw-b
      port: 6222         # Gateway listen port on gw-b
      name: "cluster-B"
  }
}
# This server can also act as a regular NATS server for local clients
routes:
  - nats://192.168.1.100:4222 # Route to a server within cluster-A

And on the cluster-B side, for gw-b:

port: 4222
gateway {
  name: "cluster-B"
  listen: 6222
  gateways {
    - host: "192.168.1.100" # IP of gw-a
      port: 6222
      name: "cluster-A"
  }
}
routes:
  - nats://192.168.2.100:4222 # Route to a server within cluster-B

When gw-a and gw-b start, they establish a gateway-to-gateway connection on their listen ports (6222 in this example). This connection is similar to a server-to-server route but specifically for inter-cluster communication. Through this connection, messages published with a subject that matches a configured "export" on one gateway can be forwarded to the other cluster.

The magic happens with the exports and imports configuration. For example, if gw-a has an export for public.>:

gateway {
  # ... other gateway config
  exports {
    - subject: "public.>"
  }
}

And gw-b has a corresponding import:

gateway {
  # ... other gateway config
  imports {
    - subject: "public.>"
      from: "cluster-A"
  }
}

A client publishing to public.hello on cluster-A will have that message forwarded by gw-a to gw-b. gw-b will then make this message available to subscribers in cluster-B. The name field in the gateway configuration is crucial here; it’s how gateways identify which cluster they are talking to and resolve imports/exports.

The truly surprising thing about NATS clustering is that servers don’t rely on a central discovery service; they actively form connections to each other using the routes configuration. If a server has a route to another server, it’s considered part of the cluster. This peer-to-peer connection model simplifies setup and eliminates a single point of failure for cluster membership itself.

When you configure routes, you’re not just telling servers where to find each other; you’re defining the backbone of your NATS mesh. The servers use these routes to gossip about cluster state, including which servers are online, and to ensure that messages find their way to all relevant parts of the distributed system. If you list nats://server-a:4222 and nats://server-b:4222 in each other’s routes, they will establish connections and consider themselves part of the same cluster.

When you configure gateways, you’re essentially creating a connection between clusters. The gateways block in the configuration tells a gateway server how to find the gateway server in another cluster. This isn’t just a one-way street; it’s a bidirectional tunnel that allows messages to flow across cluster boundaries based on subject matching defined in exports and imports. The name field ensures that the gateway knows which cluster it’s connecting to and how to route messages accordingly.

The exports and imports configuration is where you define what subjects are allowed to cross cluster boundaries. An export on one gateway makes a subject available to other clusters, while an import on another gateway allows it to receive messages for that subject from a specific named cluster. This provides fine-grained control over inter-cluster communication, preventing unwanted message propagation.

A common pitfall is forgetting to configure the routes on both sides of a server-to-server connection. If server A has a route to server B, but server B doesn’t have a route back to server A, they won’t form a proper peer-to-peer connection, and message forwarding will be one-sided or fail entirely. The routes array should be symmetrical for a fully connected cluster.

The next logical step in mastering NATS clustering is understanding how to manage and observe these connections and message flows, particularly when dealing with larger, more complex deployments.

Want structured learning?

Take the full Nats course →