All posts
Series: Monitoring NATS

Monitoring NATS: From HTTP Scrapes to System Events

Jan 24, 2026
Monitoring NATS: From HTTP Scrapes to System Events

This is Part 2 of the Monitoring NATS series. Part 1 covers HTTP monitoring endpoints in detail.


Part 1 covered the HTTP monitoring endpoints built into every NATS server—/varz, /connz, /jsz, and the rest. These endpoints provide accurate, point-in-time snapshots of server state, and they integrate well with tools like Prometheus and Grafana.

But HTTP monitoring is inherently pull-based. A scraper periodically polls an endpoint and records whatever metrics are available at that moment. If your scrape interval is 15 seconds, any transient event that happens and resolves within that window may never be observed.

NATS offers a different approach: the system account. Instead of relying only on external HTTP scraping, NATS servers can publish system events and respond to monitoring queries over the same secure NATS protocol your applications already use.

This article explains how the system account works, what it provides beyond HTTP monitoring, and when to use each approach.


HTTP Monitoring vs. System Account: Key Differences

AspectHTTP MonitoringSystem Account ($SYS)
Access methodHTTP/HTTPS on a separate portNATS protocol over existing connections
Data modelPull-based (point-in-time snapshots)Push-based advisories + pull-based request/reply
AuthenticationNone built-in (requires network isolation or reverse proxy)Full NATS auth via NKeys or JWTs
Real-time eventsNot availableAdvisories for connects, disconnects, auth errors, etc.
Firewall requirementsRequires exposing an additional portWorks over existing NATS client port
Best forPrometheus scraping, load balancer health checks, nats-topEvent-driven alerting, audit trails, edge deployments

Neither approach replaces the other—they’re complementary. HTTP monitoring is simpler to set up and integrates directly with standard observability tooling. The system account provides capabilities that HTTP cannot: real-time event streams, authenticated access, and operation in environments where exposing additional ports is impractical.


What the System Account Provides

The NATS system account enables a set of system services and advisories that let operators observe and interact with servers using NATS subjects instead of HTTP endpoints.

With the system account enabled, you gain:

  • Event-driven advisories for client connections, disconnections, and authentication errors
  • NATS-native access to the same monitoring data available via HTTP (VARZ, CONNZ, JSZ, etc.)
  • Account-level isolation between application traffic and operational visibility
  • Authenticated access using the same credentials infrastructure as your application

This model works especially well in environments where opening or scraping additional ports is difficult—such as edge deployments, leaf nodes, or locked-down networks.


The Role of the $SYS Account

The $SYS account is a special account configured on each NATS server to carry system-level traffic. It is not automatically usable—you must explicitly enable it and decide who can access it.

Conceptually, it acts as a control-plane channel:

  • Application accounts publish and subscribe to application subjects
  • The system account publishes server advisories and responds to monitoring requests
  • Only users explicitly authorized for $SYS can see or interact with this data

This separation improves visibility isolation, but it’s important to understand the boundary:

The system account isolates subjects and permissions, not CPU, memory, or I/O. A severely overloaded server can still impact all traffic, including system traffic.


Enabling the System Account

Enabling system monitoring requires a small configuration change.

Local / static config example

1
system_account: SYS

JWT-based (operator mode) example

1
system_account: <SYS_ACCOUNT_PUBLIC_NKEY>

In both cases, you then create users within that account who are authorized to subscribe to system subjects or issue monitoring requests. These credentials are separate from application users and should be treated as administrative access.


Event-Driven System Advisories

This is where the system account diverges most significantly from HTTP monitoring. Instead of polling for data, you subscribe to advisory subjects and receive events the moment they occur.

Connection Lifecycle Events

Two commonly used advisory subjects are:

  • $SYS.ACCOUNT.<account>.CONNECT
  • $SYS.ACCOUNT.<account>.DISCONNECT

These fire when a client connects or disconnects and include metadata such as client ID, server ID, and timing information. Disconnect advisories include a reason field describing why the connection ended.

This is powerful for scenarios that HTTP monitoring cannot address:

  • Detecting brief connection flaps that resolve between scrapes
  • Building real-time dashboards that update instantly
  • Triggering alerts the moment a critical service disconnects

Authentication Errors

Authentication failures are reported via:

  • $SYS.SERVER.<server>.CLIENT.AUTH.ERR

This distinction is important—auth errors should be monitored via the AUTH.ERR subject, not inferred from disconnects.


Security and Audit Visibility

Authentication error advisories provide real-time insight into:

  • Failed credential attempts
  • Misconfigured clients
  • Potential brute-force behavior

Each message includes structured data such as client IP, attempted user/account, and rejection reason. Because this data is already in JSON form and delivered over NATS, it can be streamed directly into a SIEM or alerting pipeline without parsing log files.


Periodic Server Statistics

NATS servers also publish periodic statistics summaries:

  • $SYS.SERVER.<server>.STATSZ

These messages include CPU usage, memory usage, connection counts, message rates, and slow-consumer statistics—similar to what you’d get from the /varz HTTP endpoint, but delivered as a push event.

While often informally called “heartbeats,” they are best thought of as periodic snapshots, not strict liveness guarantees. They are useful for trend analysis and alerting, but they are not instantaneous signals. For liveness checks, continue using /healthz via HTTP.


Querying Monitoring Data Over NATS

In addition to push-style advisories, the system account exposes the same monitoring endpoints covered in Part 1 via request/reply. Instead of making an HTTP request, you send a NATS request and receive the JSON response as a reply.

Supported endpoints include:

  • VARZ, CONNZ, ROUTEZ, LEAFZ, GATEWAYZ, SUBSZ, JSZ, ACCOUNTZ, HEALTHZ

Example

Instead of querying https://localhost:8222/connz, you can send a request to:

1
$SYS.REQ.SERVER.<server-id>.CONNZ

The server replies with the same JSON payload you would receive from the HTTP interface.

This allows you to build internal tooling that uses only the NATS protocol, without HTTP clients or additional firewall rules. It’s particularly valuable for:

  • Monitoring leaf nodes that don’t expose HTTP ports
  • Centralizing monitoring through a single NATS connection
  • Building custom dashboards using your existing NATS client libraries

Discovering Servers with PING

If you don’t know all server IDs in advance, you can send a request to:

1
$SYS.REQ.SERVER.PING

Every server that receives the request responds with its ID and basic health information. Because this request returns multiple replies, your client must be prepared to collect responses until a timeout occurs.

This pattern enables dynamic discovery without maintaining static host lists—essential for auto-scaling clusters and dynamic infrastructure.


Remote Configuration Reloads

On NATS 2.10 and newer, the system account can also trigger certain administrative actions.

For example:

1
$SYS.REQ.SERVER.<server-id>.RELOAD

This instructs the server to reload its configuration file, allowing permission or account updates without restarting the server or disconnecting clients.


JetStream Advisories

JetStream publishes its own advisories under the $JS.EVENT.ADVISORY.* namespace. These cover stream and consumer lifecycle events, leader elections, delivery failures, and more.

Examples include:

  • $JS.EVENT.ADVISORY.STREAM.CREATED.<stream> — stream created
  • $JS.EVENT.ADVISORY.CONSUMER.MAX_DELIVERIES.<stream>.<consumer> — message exceeded delivery threshold
  • $JS.EVENT.ADVISORY.STREAM.LEADER_ELECTED.<stream> — new stream leader elected
  • $JS.EVENT.ADVISORY.API — API audit trail

JetStream also publishes metrics under $JS.EVENT.METRIC.*, such as consumer ack latency data.

These advisories enable:

  • Audit logging of all JetStream API operations
  • Alerting when consumers hit max delivery limits (indicating processing failures)
  • Tracking leader elections across a Raft cluster
  • Building operational dashboards without external scraping

Complete Reference

The exact set of advisories and request subjects evolves as new NATS features are added.

The authoritative list of core system advisories is maintained in the official documentation: System Accounts. For the complete reference, the subject constants are defined in events.go in the nats-server source.

The complete list of JetStream advisory subjects is defined in jetstream_api.go.


Using the NATS CLI

The NATS CLI is the fastest way to explore system monitoring.

To observe system traffic (with appropriate credentials):

Terminal window
nats sub '$SYS.>'

This subscribes to all system subjects and is useful for learning and debugging. In production, it’s best to narrow subscriptions to specific subjects to avoid excessive volume.

Example:

Terminal window
nats sub '$SYS.ACCOUNT.*.DISCONNECT'

Integrating with Your Observability Stack

System events do not replace Prometheus, Grafana, Datadog, or similar tools. Instead, they extend what’s possible:

  • Advisories provide immediate, event-driven signals that HTTP scraping cannot capture
  • Periodic stats provide structured metrics similar to HTTP endpoints
  • A small collector service can subscribe to $SYS subjects and forward data to your observability stack
  • No per-node exporters or sidecars are required—one subscriber can monitor an entire cluster

Because system events are normal NATS messages, you can also persist them using JetStream to create an audit log or replay incidents after the fact.

Tools like NATS Surveyor implement this pattern, collecting system account data and exposing it as Prometheus metrics without requiring HTTP monitoring ports on each server.


Building Your Monitoring Strategy

For most deployments, use both approaches:

Use HTTP monitoring for:

  • Kubernetes liveness/readiness probes (/healthz)
  • Prometheus scraping with the NATS Exporter
  • Quick debugging with nats-top
  • Load balancer health checks

Use the system account for:

  • Real-time connection and disconnection tracking
  • Authentication error alerting
  • Audit logging of JetStream operations
  • Environments where HTTP ports cannot be exposed
  • Centralized monitoring of distributed clusters

Summary

NATS makes traditional monitoring easier—no sidecars or probes required—and extends it by embedding observability directly into the messaging system.

By using the system account:

  • You gain real-time visibility into events that HTTP scraping would miss
  • You can access monitoring data without exposing additional ports
  • You reduce operational complexity, especially in distributed or edge environments
  • You use the same authentication and authorization infrastructure as your applications

The system account enables robust, NATS-native observability pipelines.

If you want to go deeper, the NATS documentation on system accounts is the best next stop—and the NATS community Slack is an excellent place to ask real-world operational questions.


Need Help With NATS?

The team at Synadia are the creators and maintainers of NATS. If you need help architecting, monitoring, or scaling your deployment, get in touch.

Get the NATS Newsletter

News and content from across the community


© 2026 Synadia Communications, Inc.
Cancel