NATS is a message bus, not a distributed coordination service, and trying to use it as one is like trying to hammer a nail with a banana.
Let’s look at NATS in action, not as a coordinator, but as what it is: a distributed messaging system. Imagine we have a microservice architecture. We’ve got a user-service that needs to notify an email-service whenever a new user signs up.
First, we set up our NATS server. A minimal setup might look like this:
nats-server -p 4222 -m 8222
This starts a NATS server on port 4222 and its monitoring port on 8222.
Now, in our user-service (let’s say it’s written in Go), we’ll publish a message when a user signs up:
package main
import (
"log"
"time"
"github.com/nats-io/nats.go"
)
func main() {
// Connect to NATS
nc, err := nats.Connect("nats://localhost:4222")
if err != nil {
log.Fatal(err)
}
defer nc.Close()
log.Println("Connected to NATS")
// Simulate user signup
for i := 0; i < 5; i++ {
userID := i + 1
message := []byte("user_signup:" + string(rune(userID)))
// Publish the message
err := nc.Publish("users.signup", message)
if err != nil {
log.Printf("Error publishing message: %v", err)
} else {
log.Printf("Published: %s", message)
}
time.Sleep(1 * time.Second)
}
}
And in our email-service, we’ll subscribe to that message and send an email:
package main
import (
"log"
"time"
"github.com/nats-io/nats.go"
)
func main() {
// Connect to NATS
nc, err := nats.Connect("nats://localhost:4222")
if err != nil {
log.Fatal(err)
}
defer nc.Close()
log.Println("Connected to NATS")
// Subscribe to user signup events
sub, err := nc.Subscribe("users.signup", func(msg *nats.Msg) {
log.Printf("Received message: %s", string(msg.Data))
// In a real scenario, this would send an email
log.Println("Sending signup confirmation email...")
})
if err != nil {
log.Fatal(err)
}
defer sub.Unsubscribe()
// Keep the subscriber running
select {}
}
When user-service runs, you’ll see it publishing messages like Published: user_signup:1, and email-service will immediately receive and log Received message: user_signup:1. This is NATS doing what it does best: fast, reliable message delivery.
Now, let’s contrast this with ZooKeeper and etcd, which are designed for distributed coordination. Their core job is to maintain a consistent view of shared state across a cluster of machines. Think about leader election, distributed locks, or service discovery.
ZooKeeper uses a consensus algorithm called ZAB (ZooKeeper Atomic Broadcast). When you write data to ZooKeeper, that write must be acknowledged by a majority of its ensemble members before it’s considered committed. This ensures that all nodes see the same data in the same order.
Here’s a simplified view of ZooKeeper’s coordination: if you have a cluster of 3 ZooKeeper nodes, a write operation needs at least 2 nodes to agree. If one node fails, the cluster can continue operating. This is critical for maintaining the integrity of your coordination data.
etcd uses the Raft consensus algorithm, which is similar in principle to ZAB. Raft is designed to be more understandable and easier to implement. In an etcd cluster, a leader is elected, and all write operations go through the leader. The leader then replicates the write to its followers, and the write is committed once a majority of nodes have acknowledged it.
Consider service discovery. A discovery-service might register its endpoint with etcd under a key like /services/user-service/instance-1. Other services looking for the user-service can then watch that key. If the discovery-service instance fails, etcd can detect this (via heartbeats or leases) and remove the entry, allowing other services to stop sending traffic to it. This is a coordination task: managing shared state about available service instances.
NATS, on the other hand, doesn’t provide this kind of strong consistency for state management. Its primary goal is high-throughput, low-latency message delivery. It doesn’t have a consensus mechanism to ensure all clients see a shared state in the same order. If you tried to use NATS for leader election, for example, by having services publish "I am the leader" messages, you’d have no guarantee that all services would receive those messages in the same order, or even receive them at all if a NATS server went down mid-election.
The crucial difference lies in their design goals: NATS is for communication, while ZooKeeper and etcd are for coordination.
One of the fundamental reasons NATS is so fast is its stateless nature at the edge. When a client connects, the NATS server doesn’t need to involve a complex consensus protocol to confirm the client’s presence or state. It simply accepts the connection and manages the flow of messages. This contrasts sharply with etcd or ZooKeeper, where every write operation is a distributed consensus event, inherently adding latency. NATS can offer some durability through JetStream, its persistence layer, but even JetStream is primarily about reliable message delivery, not distributed state consensus.
If you’re building a system where services need to reliably discover each other, manage distributed locks, or elect leaders, you should reach for etcd or ZooKeeper. If you need to broadcast events, fan out tasks, or enable publish-subscribe communication between loosely coupled services, NATS is your tool.
The next step after understanding NATS’s messaging capabilities is to explore its JetStream feature for persistent message queues.