Monitoring NATS: From HTTP Scrapes to System Events

This is Part 2 of the Monitoring NATS series. Part 1 covers HTTP monitoring endpoints in detail.

Part 1 covered the HTTP monitoring endpoints built into every NATS server—/varz, /connz, /jsz, and the rest. These endpoints provide accurate, point-in-time snapshots of server state, and they integrate well with tools like Prometheus and Grafana.

But HTTP monitoring is inherently pull-based. A scraper periodically polls an endpoint and records whatever metrics are available at that moment. If your scrape interval is 15 seconds, any transient event that happens and resolves within that window may never be observed.

NATS offers a different approach: the system account. Instead of relying only on external HTTP scraping, NATS servers can publish system events and respond to monitoring queries over the same secure NATS protocol your applications already use.

This article explains how the system account works, what it provides beyond HTTP monitoring, and when to use each approach.

HTTP Monitoring vs. System Account: Key Differences

Aspect	HTTP Monitoring	System Account ($SYS)
Access method	HTTP/HTTPS on a separate port	NATS protocol over existing connections
Data model	Pull-based (point-in-time snapshots)	Push-based advisories + pull-based request/reply
Authentication	None built-in (requires network isolation or reverse proxy)	Full NATS auth via NKeys or JWTs
Real-time events	Not available	Advisories for connects, disconnects, auth errors, etc.
Firewall requirements	Requires exposing an additional port	Works over existing NATS client port
Best for	Prometheus scraping, load balancer health checks, nats-top	Event-driven alerting, audit trails, edge deployments

Neither approach replaces the other—they’re complementary. HTTP monitoring is simpler to set up and integrates directly with standard observability tooling. The system account provides capabilities that HTTP cannot: real-time event streams, authenticated access, and operation in environments where exposing additional ports is impractical.

What the System Account Provides

The NATS system account enables a set of system services and advisories that let operators observe and interact with servers using NATS subjects instead of HTTP endpoints.

With the system account enabled, you gain:

Event-driven advisories for client connections, disconnections, and authentication errors
NATS-native access to the same monitoring data available via HTTP (VARZ, CONNZ, JSZ, etc.)
Account-level isolation between application traffic and operational visibility
Authenticated access using the same credentials infrastructure as your application

This model works especially well in environments where opening or scraping additional ports is difficult—such as edge deployments, leaf nodes, or locked-down networks.

The Role of the $SYS Account

The $SYS account is a special account configured on each NATS server to carry system-level traffic. It is not automatically usable—you must explicitly enable it and decide who can access it.

Conceptually, it acts as a control-plane channel:

Application accounts publish and subscribe to application subjects
The system account publishes server advisories and responds to monitoring requests
Only users explicitly authorized for $SYS can see or interact with this data

This separation improves visibility isolation, but it’s important to understand the boundary:

The system account isolates subjects and permissions, not CPU, memory, or I/O. A severely overloaded server can still impact all traffic, including system traffic.

Enabling the System Account

Enabling system monitoring requires a small configuration change.

Local / static config example

1
system_account: SYS

JWT-based (operator mode) example

1
system_account: <SYS_ACCOUNT_PUBLIC_NKEY>

In both cases, you then create users within that account who are authorized to subscribe to system subjects or issue monitoring requests. These credentials are separate from application users and should be treated as administrative access.

Event-Driven System Advisories

This is where the system account diverges most significantly from HTTP monitoring. Instead of polling for data, you subscribe to advisory subjects and receive events the moment they occur.

Connection Lifecycle Events

Two commonly used advisory subjects are:

$SYS.ACCOUNT.<account>.CONNECT
$SYS.ACCOUNT.<account>.DISCONNECT

These fire when a client connects or disconnects and include metadata such as client ID, server ID, and timing information. Disconnect advisories include a reason field describing why the connection ended.

This is powerful for scenarios that HTTP monitoring cannot address:

Detecting brief connection flaps that resolve between scrapes
Building real-time dashboards that update instantly
Triggering alerts the moment a critical service disconnects

Authentication Errors

Authentication failures are reported via:

$SYS.SERVER.<server>.CLIENT.AUTH.ERR

This distinction is important—auth errors should be monitored via the AUTH.ERR subject, not inferred from disconnects.

Security and Audit Visibility

Authentication error advisories provide real-time insight into:

Failed credential attempts
Misconfigured clients
Potential brute-force behavior

Each message includes structured data such as client IP, attempted user/account, and rejection reason. Because this data is already in JSON form and delivered over NATS, it can be streamed directly into a SIEM or alerting pipeline without parsing log files.

Periodic Server Statistics

NATS servers also publish periodic statistics summaries:

$SYS.SERVER.<server>.STATSZ

These messages include CPU usage, memory usage, connection counts, message rates, and slow-consumer statistics—similar to what you’d get from the /varz HTTP endpoint, but delivered as a push event.

While often informally called “heartbeats,” they are best thought of as periodic snapshots, not strict liveness guarantees. They are useful for trend analysis and alerting, but they are not instantaneous signals. For liveness checks, continue using /healthz via HTTP.

Querying Monitoring Data Over NATS

In addition to push-style advisories, the system account exposes the same monitoring endpoints covered in Part 1 via request/reply. Instead of making an HTTP request, you send a NATS request and receive the JSON response as a reply.

Supported endpoints include:

VARZ, CONNZ, ROUTEZ, LEAFZ, GATEWAYZ, SUBSZ, JSZ, ACCOUNTZ, HEALTHZ

Example

Instead of querying https://localhost:8222/connz, you can send a request to:

1
$SYS.REQ.SERVER.<server-id>.CONNZ

The server replies with the same JSON payload you would receive from the HTTP interface.

This allows you to build internal tooling that uses only the NATS protocol, without HTTP clients or additional firewall rules. It’s particularly valuable for:

Monitoring leaf nodes that don’t expose HTTP ports
Centralizing monitoring through a single NATS connection
Building custom dashboards using your existing NATS client libraries

Discovering Servers with PING

If you don’t know all server IDs in advance, you can send a request to:

1
$SYS.REQ.SERVER.PING

Every server that receives the request responds with its ID and basic health information. Because this request returns multiple replies, your client must be prepared to collect responses until a timeout occurs.

This pattern enables dynamic discovery without maintaining static host lists—essential for auto-scaling clusters and dynamic infrastructure.

Remote Configuration Reloads

On NATS 2.10 and newer, the system account can also trigger certain administrative actions.

For example:

1
$SYS.REQ.SERVER.<server-id>.RELOAD

This instructs the server to reload its configuration file, allowing permission or account updates without restarting the server or disconnecting clients.

JetStream Advisories

JetStream publishes its own advisories under the $JS.EVENT.ADVISORY.* namespace. These cover stream and consumer lifecycle events, leader elections, delivery failures, and more.

Examples include:

$JS.EVENT.ADVISORY.STREAM.CREATED.<stream> — stream created
$JS.EVENT.ADVISORY.CONSUMER.MAX_DELIVERIES.<stream>.<consumer> — message exceeded delivery threshold
$JS.EVENT.ADVISORY.STREAM.LEADER_ELECTED.<stream> — new stream leader elected
$JS.EVENT.ADVISORY.API — API audit trail

JetStream also publishes metrics under $JS.EVENT.METRIC.*, such as consumer ack latency data.

These advisories enable:

Audit logging of all JetStream API operations
Alerting when consumers hit max delivery limits (indicating processing failures)
Tracking leader elections across a Raft cluster
Building operational dashboards without external scraping

Complete Reference

The exact set of advisories and request subjects evolves as new NATS features are added.

The authoritative list of core system advisories is maintained in the official documentation: System Accounts. For the complete reference, the subject constants are defined in events.go in the nats-server source.

The complete list of JetStream advisory subjects is defined in jetstream_api.go.

Using the NATS CLI

The NATS CLI is the fastest way to explore system monitoring.

To observe system traffic (with appropriate credentials):

nats sub '$SYS.>'

This subscribes to all system subjects and is useful for learning and debugging. In production, it’s best to narrow subscriptions to specific subjects to avoid excessive volume.

Example:

nats sub '$SYS.ACCOUNT.*.DISCONNECT'

Integrating with Your Observability Stack

System events do not replace Prometheus, Grafana, Datadog, or similar tools. Instead, they extend what’s possible:

Advisories provide immediate, event-driven signals that HTTP scraping cannot capture
Periodic stats provide structured metrics similar to HTTP endpoints
A small collector service can subscribe to $SYS subjects and forward data to your observability stack
No per-node exporters or sidecars are required—one subscriber can monitor an entire cluster

Because system events are normal NATS messages, you can also persist them using JetStream to create an audit log or replay incidents after the fact.

Tools like NATS Surveyor implement this pattern, collecting system account data and exposing it as Prometheus metrics without requiring HTTP monitoring ports on each server.

Building Your Monitoring Strategy

For most deployments, use both approaches:

Use HTTP monitoring for:

Kubernetes liveness/readiness probes (/healthz)
Prometheus scraping with the NATS Exporter
Quick debugging with nats-top
Load balancer health checks

Use the system account for:

Real-time connection and disconnection tracking
Authentication error alerting
Audit logging of JetStream operations
Environments where HTTP ports cannot be exposed
Centralized monitoring of distributed clusters

Summary

NATS makes traditional monitoring easier—no sidecars or probes required—and extends it by embedding observability directly into the messaging system.

By using the system account:

You gain real-time visibility into events that HTTP scraping would miss
You can access monitoring data without exposing additional ports
You reduce operational complexity, especially in distributed or edge environments
You use the same authentication and authorization infrastructure as your applications

The system account enables robust, NATS-native observability pipelines.

If you want to go deeper, the NATS documentation on system accounts is the best next stop—and the NATS community Slack is an excellent place to ask real-world operational questions.

Want granular, comprehensive NATS monitoring that’s up and running in minutes? Try Synadia Insights and get an entity-level view of your entire NATS system.

Need Help With NATS?

The team at Synadia are the creators and maintainers of NATS. If you need help architecting, monitoring, or scaling your deployment, get in touch.

FEATURED

RESOURCES

Comparisons