NATS can achieve sub-millisecond latency not by magic, but by aggressively minimizing overhead at every single step of message delivery.

Let’s see it in action. We’ll set up a simple NATS publisher and subscriber, then benchmark their round-trip time.

First, ensure you have NATS server running. A simple nats-server command in your terminal is all it takes for a default setup.

Now, for the publisher and subscriber. We’ll use the nats-bench tool, which is part of the NATS Go client. If you don’t have it, go install github.com/nats-io/nats.go/nats-bench@latest will get it.

In one terminal, start the subscriber:

nats-bench -s nats://localhost:4222 -p sub -c 1000 -m 128

This starts a subscriber (sub) connecting to nats://localhost:4222, expecting 1000 messages, each 128 bytes in size.

In another terminal, start the publisher, targeting the same subject and server, and sending the same message size:

nats-bench -s nats://localhost:4222 -p pub -c 1000 -m 128 -t benchmark.subject

This initiates a publisher (pub) sending 1000 messages of 128 bytes to benchmark.subject on nats://localhost:4222.

The nats-bench tool will report statistics. Look for the "P99" or "99th percentile" latency. With a local setup like this, you’ll commonly see values well under 1 millisecond, often in the tens or low hundreds of microseconds.

The core problem NATS solves is efficient, high-throughput, low-latency messaging. Traditional message queues often involve complex state management, disk persistence for every message, or heavy serialization/deserialization, all of which add latency. NATS is designed to avoid these bottlenecks.

Internally, NATS uses a highly optimized, single-threaded, event-driven model for its core message routing. When a publisher sends a message, the NATS server receives it, performs minimal processing (essentially, looking up subscribers for that subject), and immediately forwards it to the connected subscribers. There’s no persistent queueing on the server side for regular publishes; messages are "fire and forget" by default, delivered to currently active subscribers.

The protocol itself is text-based and extremely simple, minimizing parsing overhead. For example, a PUB message looks like: PUB subject reply size\r\nbody\r\n. The server parses this, finds matching subscribers, and sends them a MSG message: MSG subject reply size\r\nbody\r\n. This simplicity is key.

The "streaming" or durable aspects of NATS (NATS Streaming, now JetStream) introduce more complex behaviors like persistence and guaranteed delivery, but the core NATS messaging layer remains incredibly fast for in-memory, at-most-once delivery. The benchmarks you see are for this core layer.

What most people miss is how the absence of complex consensus protocols or disk writes for every message enables this speed. For typical use cases where transient data or event notifications are sufficient, NATS doesn’t need to wait for disk flushes or agreement from a quorum of nodes. The server simply pushes the data to its connected clients as quickly as the network and CPU allow. This direct, in-memory forwarding is the secret sauce.

The next step in exploring NATS performance is understanding how network topology and client connection management impact latency.

Want structured learning?

Take the full Nats course →