The NATS Prometheus exporter is a specialized tool that bridges NATS, a high-performance messaging system, and Prometheus, a popular time-series monitoring and alerting system. Its primary function is to expose NATS server metrics in a format that Prometheus can scrape and process. This allows operators to gain deep insights into the health, performance, and resource utilization of their NATS deployments.
Let’s see it in action. Imagine you have a NATS server running and you want to monitor its message throughput and connection count.
First, you’d start your NATS server with the Prometheus exporter enabled. This is typically done via a configuration file or command-line flags. For instance, in your nats-server.conf:
server: {
listen: ":4222"
http: {
addr: ":8222"
debug: true
pprof: true
}
# Enable Prometheus metrics endpoint
metrics: {
port: 9100
}
}
With this configuration, NATS will start an HTTP server on port 9100 that exposes Prometheus-formatted metrics. You can verify this by visiting http://localhost:9100/metrics in your browser. You’ll see output like this:
# HELP nats_server_version NATS server version
# TYPE nats_server_version gauge
nats_server_version{version="2.9.19"} 1
# HELP nats_server_uptime_seconds Uptime of the NATS server in seconds
# TYPE nats_server_uptime_seconds gauge
nats_server_uptime_seconds 12345.67
# HELP nats_server_clients The number of connected clients
# TYPE nats_server_clients gauge
nats_server_clients 15
# HELP nats_server_total_connections The total number of connections accepted
# TYPE nats_server_total_connections counter
nats_server_total_connections 10000
# HELP nats_server_total_bytes_sent The total number of bytes sent by the server
# TYPE nats_server_total_bytes_sent counter
nats_server_total_bytes_sent 500000000
# HELP nats_server_total_bytes_received The total number of bytes received by the server
# TYPE nats_server_total_bytes_received counter
nats_server_total_bytes_received 750000000
# HELP nats_server_total_messages_sent The total number of messages sent by the server
# TYPE nats_server_total_messages_sent counter
nats_server_total_messages_sent 1000000
# HELP nats_server_total_messages_received The total number of messages received by the server
# TYPE nats_server_total_messages_received counter
nats_server_total_messages_received 1500000
# HELP nats_server_total_payload_bytes_sent The total number of payload bytes sent by the server
# TYPE nats_server_total_payload_bytes_sent counter
nats_server_total_payload_bytes_sent 400000000
# HELP nats_server_total_payload_bytes_received The total number of payload bytes received by the server
# TYPE nats_server_total_payload_bytes_received counter
nats_server_total_payload_bytes_received 600000000
# HELP nats_server_slow_consumers The number of slow consumers detected
# TYPE nats_server_slow_consumers counter
nats_server_slow_consumers 5
# HELP nats_server_memory_bytes Current memory usage in bytes
# TYPE nats_server_memory_bytes gauge
nats_server_memory_bytes 256000000
# HELP nats_server_cpu_cores The number of CPU cores available to the NATS server process
# TYPE nats_server_cpu_cores gauge
nats_server_cpu_cores 8
Now, you need to configure Prometheus to scrape these metrics. In your prometheus.yml configuration file, you would add a scrape job:
scrape_configs:
- job_name: 'nats'
static_configs:
- targets: ['localhost:9100']
Once Prometheus is running with this configuration, it will periodically fetch the metrics from localhost:9100. You can then use these metrics in Grafana or other visualization tools to create dashboards. For example, you might create a graph showing nats_server_clients over time, or calculate message rate using rate(nats_server_total_messages_sent[5m]).
The NATS Prometheus exporter is built into the NATS server itself, so you don’t need to run a separate process. It leverages the Go runtime’s built-in HTTP server and the Prometheus client library to expose metrics. The metrics are categorized into gauges (values that can go up or down, like client count or memory usage) and counters (values that only ever increase, like total connections or bytes sent).
A key aspect to understand is how NATS internally tracks these metrics. For instance, nats_server_clients is a direct count of active connections managed by the server’s connection handler. nats_server_total_messages_sent increments every time a message is successfully dispatched to a client. The exporter simply reads these internal counters and gauges and formats them for Prometheus.
What often surprises people is that the nats_server_total_bytes_sent and nats_server_total_payload_bytes_sent metrics are distinct. The former includes all bytes transmitted over the wire, which encompasses protocol overhead like framing, acknowledgments, and control messages. The latter, nats_server_total_payload_bytes_sent, specifically tracks only the actual data payload of the messages. This distinction is crucial for accurately understanding network bandwidth utilization versus actual message data volume, especially in high-throughput scenarios where protocol overhead can become significant.
With these metrics scraped, you can set up alerts. For instance, an alert for nats_server_slow_consumers > 0 would notify you immediately if any client is falling behind in processing messages, indicating a potential bottleneck.
The next step in monitoring your NATS deployment would be to explore cluster-level metrics and potentially integrate with distributed tracing systems for end-to-end message flow analysis.