All posts
Series: Monitoring NATS

NATS HTTP Monitoring Endpoints: A Complete Guide

Jan 24, 2026
NATS HTTP Monitoring Endpoints: A Complete Guide

This is Part 1 of the Monitoring NATS series. Part 2 covers event-driven monitoring using the NATS system account.


NATS takes a different approach to observability than most distributed systems. Instead of relying on external monitoring probes, sidecars, or heavyweight telemetry pipelines, the nats-server binary includes a built-in HTTP server that exposes real-time telemetry as JSON.

This data reflects the live internal state of the server. There’s no sampling, no asynchronous exporters inside the server, and no continuously running probes competing for CPU or memory. Responses from the monitoring endpoints reflect what the server is doing at that moment.

The tradeoff is intentional: you get accurate, low-overhead telemetry, but you need to understand which endpoints exist, what they report, and how to expose them safely.


Key Takeaways

  • NATS provides built-in HTTP monitoring endpoints that expose real-time server telemetry without requiring external probes or sidecars.
  • Monitoring endpoints are disabled by default and must be explicitly enabled.
  • These endpoints do not support HTTP-level authentication, so they must be protected using TLS and network isolation.
  • Core endpoints include:
    • /healthz for liveness
    • /varz for server health and runtime metrics
    • /connz for client connection state
    • /routez, /leafz, and /gatewayz for topology
    • /subsz for subscription interest
    • /jsz for JetStream persistence
  • Prometheus and Grafana are commonly used to turn this raw JSON into dashboards and alerts.
  • CLI tools like nats-top are invaluable for real-time incident response.

The NATS Monitoring Architecture

For security, the monitoring subsystem is disabled by default. These endpoints expose detailed information about your infrastructure, client behavior, and subject topology.

Once explicitly enabled, the server runs a lightweight HTTP(S) listener that serves JSON responses for each endpoint. Every endpoint is read-only and reflects the current state of the server at request time.

Unlike systems that depend on continuously running monitoring probes, NATS exposes telemetry directly from the server process itself.


Enabling the Monitoring Port

You can enable monitoring in three ways:

CLI flag

Terminal window
nats-server -m 8222

Configuration file (HTTP)

1
http_port: 8222

Configuration file (HTTPS – recommended)

1
https_port: 8222
2
3
tls {
4
cert_file: "/path/to/server.crt"
5
key_file: "/path/to/server.key"
6
}

Once enabled, you can access endpoints like:

1
https://localhost:8222/varz

Each endpoint follows the same pattern: append the endpoint name to the monitoring URL and parse the JSON response.

Important: NATS does not provide authentication or authorization for the monitoring HTTP server. Treat this port as internal-only.


Health Checks: /healthz

For basic liveness and readiness checks, use /healthz.

  • Returns 200 OK when the server is able to accept connections
  • Lightweight and stable
  • Ideal for Kubernetes liveness/readiness probes and load-balancer health checks

Use /healthz for “is this node alive?” Use /varz for “is this node healthy and behaving correctly?”


Core Health and Runtime Metrics: /varz

The /varz endpoint provides a high-level snapshot of server state, resource usage, and throughput. It’s the best place to start when diagnosing overall node health.

What to Monitor

Memory and CPU

  • mem reflects memory used by the NATS process
  • Memory growth may be caused by:
    • Slow consumers
    • JetStream caching
    • Route or leaf buffers
    • High subscription counts
  • cpu reports process CPU usage as observed by the Go runtime
    • May exceed 100% on multi-core systems
    • Does not represent total node or cgroup CPU

Slow Consumers

  • slow_consumers increments when the server detects clients that cannot drain outbound data fast enough
  • Indicates backpressure and sustained overload conditions
  • Behavior differs between:
    • Core NATS
    • JetStream push consumers
    • JetStream pull consumers (less likely to trigger this condition)

Uptime

  • Useful for detecting silent restarts
  • Especially valuable in containerized environments

Throughput

  • in_msgs, out_msgs, in_bytes, out_bytes
  • Use deltas over time to calculate rates

Limits and Capacity

/varz includes both configured limits and current usage:

  • max_payload
  • max_connections
  • max_subscriptions

When limits are reached, new connections or subscriptions are rejected.


Managing Client State: /connz

The /connz endpoint exposes detailed information about every active client connection. This is your primary tool for identifying misbehaving clients and debugging traffic patterns.

What to Watch

Pending Bytes

  • Amount of data buffered waiting to be sent to a client
  • Sustained growth means the client is falling behind

Message Counts

  • msgs_to and msgs_from reveal high-volume producers and consumers
  • Useful for identifying noisy neighbors

Idle Time

  • Long-lived idle connections may indicate connection leaks or mismanaged pools

Filtering and Pagination

On busy servers, always use query parameters:

1
?limit=100&offset=0
2
?sort=pending
3
?subs=1
4
?state=closed

Recently closed connections include exit reasons, which is invaluable for diagnosing intermittent failures.


Cluster and Topology Monitoring

As deployments scale beyond a single node, topology visibility becomes essential.

Routes: /routez

Shows active routes between cluster nodes.

Key fields:

  • remote_id
  • pending_size
  • in_msgs, out_msgs

High pending sizes may indicate network congestion, slow peers, or asymmetric traffic patterns.

Leaf Nodes and Gateways

Additional topology endpoints:

  • /leafz – leaf node connections
  • /gatewayz – inter-cluster gateways

These are critical for multi-region and edge deployments.


Subscription Interest: /subsz

The /subsz endpoint exposes the internal subscription interest graph used for message routing.

Use cases:

  • Debug delivery issues
  • Inspect wildcard matching
  • Identify high fan-out subjects

Warning: Output can be very large. In production, prefer targeted CLI queries over full dumps.


JetStream Monitoring: /jsz

If you use JetStream, /jsz becomes a first-class monitoring endpoint.

Key Metrics

Reserved Resources

  • reserved_mem
  • reserved_store
  • Logical reservations based on stream configuration

Streams

  • Total number of active streams
  • Useful for capacity planning

API Errors

  • Often caused by misconfigured clients or quota violations

Account-Level Visibility

1
/jsz?accounts=1

Provides per-account resource usage for multi-tenant or shared environments.


Security Best Practices for Monitoring

Monitoring endpoints expose sensitive operational data and must never be public.

  1. Network Isolation (Primary Defense)

    • Bind to localhost or a private management network
    • Restrict via firewalls or Kubernetes network policies
  2. TLS

    • Always use https_port in production
    • Prevents eavesdropping and tampering
  3. External Authentication (If Required)

    • Place the monitoring port behind a reverse proxy that provides mTLS, Basic Auth, or OIDC
    • NATS itself does not authenticate monitoring requests

Integrating with Observability Tooling

Prometheus and Grafana

The NATS Prometheus Exporter is the standard production integration. It:

  • Scrapes monitoring endpoints
  • Converts JSON into labeled metrics
  • Handles retries and connection reuse

If you prefer not to expose HTTP monitoring ports on each server, tools like Surveyor collect metrics over the NATS protocol instead, providing a single Prometheus endpoint for your entire cluster. Part 2 of this series covers this approach in detail.


Real-Time CLI Tools

For immediate troubleshooting, nats-top provides a live view of server activity:

Terminal window
nats-top -s https://localhost:8222

This is often the fastest way to answer: “What’s happening right now?”


Building Your Monitoring Strategy

  1. Enable monitoring ports with TLS
  2. Use /healthz for liveness checks
  3. Monitor /varz for capacity and health
  4. Deploy Prometheus for historical analysis
  5. Use /connz during incidents
  6. Reserve /subsz, /routez, /leafz, and /gatewayz for targeted debugging

NATS’ built-in monitoring endpoints give you accurate observability without relying on external monitoring probes. Whether you’re running a single node or a global supercluster, the same primitives apply—the difference is scale and aggregation.


Next: Event-Driven Monitoring with the System Account

HTTP monitoring endpoints are pull-based: you query them and receive a point-in-time snapshot. This works well for dashboards and trend analysis, but it has limitations. If a client connects and disconnects between scrapes, you may never see it.

In Part 2 of this series, we explore the NATS system account ($SYS)—a push-based, event-driven approach that delivers real-time advisories for connections, disconnections, authentication errors, and more. You’ll also learn how to query the same monitoring data covered here using the NATS protocol instead of HTTP.


Need Help With NATS?

The team at Synadia are the creators and maintainers of NATS. If you need help architecting, monitoring, or scaling your deployment, get in touch.

Get the NATS Newsletter

News and content from across the community


© 2026 Synadia Communications, Inc.
Cancel