The NATS cluster is unavailable because the NATS server responsible for coordinating cluster membership has failed to start or is unable to join the existing cluster.

Cause 1: Incorrect cluster Port Configuration

Diagnosis: Check the NATS server configuration file (e.g., nats-server.conf) for the cluster port. Ensure it’s correctly specified and not conflicting with other services.

grep cluster_port /etc/nats/nats-server.conf

Fix: If the cluster_port is missing or incorrect, add/correct it. For example, ensure it’s set to a free port, typically 6222:

cluster {
  listen: 0.0.0.0:6222
}

Why it works: The cluster_port is how NATS servers discover and communicate with each other to form a cluster. If this port is wrong or blocked, they can’t find each other.

Cause 2: Firewall Blocking Cluster Port

Diagnosis: Use ufw or firewalld to check if the cluster_port (default 6222) is allowed on all nodes.

sudo ufw status verbose
# or
sudo firewall-cmd --list-all

Fix: Allow the cluster_port for TCP traffic on all NATS nodes.

sudo ufw allow 6222/tcp
# or
sudo firewall-cmd --add-port=6222/tcp --permanent && sudo firewall-cmd --reload

Why it works: Network firewalls can prevent NATS servers from establishing the necessary connections for cluster formation and health checks.

Cause 3: Incorrect routes Configuration

Diagnosis: Examine the routes section in the NATS server configuration file on each node. Verify that each server is correctly pointing to at least one other server’s advertised cluster address.

grep routes /etc/nats/nats-server.conf

Fix: Ensure the routes array contains valid nats://host:port entries for other cluster members. For a 3-node cluster, a node might have:

routes [
  "nats://node1.example.com:6222",
  "nats://node2.example.com:6222"
]

Why it works: The routes configuration explicitly tells a NATS server which other servers it should attempt to connect to for clustering. Misconfigurations here lead to isolation.

Cause 4: DNS Resolution Issues

Diagnosis: On each NATS server, try to ping or nslookup the hostnames of other NATS cluster members using the names specified in the routes configuration.

ping node1.example.com
nslookup node2.example.com

Fix: Correct DNS records or update /etc/hosts files on all nodes to ensure hostnames resolve to the correct IP addresses.

# Example /etc/hosts entry
192.168.1.10 node1.example.com

Why it works: If a server can’t resolve the hostname of another server, it cannot establish a connection to it, breaking the cluster link.

Cause 5: TLS Configuration Mismatch for Cluster Communication

Diagnosis: If TLS is enabled for cluster communication (using tls or tls_verify within the cluster block), check that all servers have compatible TLS certificates and key configurations. Look for errors in NATS server logs related to TLS handshake failures.

# Check logs for errors like "tls: bad certificate" or "EOF" during connection
sudo journalctl -u nats-server -f

Fix: Ensure that the tls configuration in nats-server.conf is identical across all nodes, or that certificates are correctly chained and trusted. This includes specifying ca, cert, and key paths if using mutual TLS.

cluster {
  listen: 0.0.0.0:6222
  tls {
    ca: /etc/nats/certs/ca.pem
    cert: /etc/nats/certs/server.pem
    key: /etc/nats/certs/server-key.pem
  }
}

Why it works: TLS handshake failures prevent secure communication channels from being established between cluster members, halting cluster formation.

Cause 6: Insufficient System Resources

Diagnosis: Monitor CPU, memory, and network I/O on the NATS server nodes. High resource utilization can prevent the NATS server process from starting or responding to cluster join requests.

top -n 1 -c
# or
htop

Fix: Allocate more resources to the NATS server instances (e.g., increase VM RAM, CPU cores) or optimize other processes consuming resources on the same nodes. Why it works: The NATS server, especially in a cluster, requires adequate resources to maintain its internal state, process connections, and communicate with peers. Starvation leads to instability.

The next error you’ll likely encounter is ERR_UNAUTHORIZED if you attempt to publish messages to a NATS JetStream stream on a cluster that is still not fully formed or has quorum issues.

Want structured learning?

Take the full Nats course →