NATS HTTP Monitoring Endpoints: A Complete Guide

This is Part 1 of the Monitoring NATS series. Part 2 covers event-driven monitoring using the NATS system account.

NATS takes a different approach to observability than most distributed systems. Instead of relying on external monitoring probes, sidecars, or heavyweight telemetry pipelines, the nats-server binary includes a built-in HTTP server that exposes real-time telemetry as JSON.

This data reflects the live internal state of the server. There’s no sampling, no asynchronous exporters inside the server, and no continuously running probes competing for CPU or memory. Responses from the monitoring endpoints reflect what the server is doing at that moment.

The tradeoff is intentional: you get accurate, low-overhead telemetry, but you need to understand which endpoints exist, what they report, and how to expose them safely.

Key Takeaways

NATS provides built-in HTTP monitoring endpoints that expose real-time server telemetry without requiring external probes or sidecars.
Monitoring endpoints are disabled by default and must be explicitly enabled.
These endpoints do not support HTTP-level authentication, so they must be protected using TLS and network isolation.
Core endpoints include:
- /healthz for liveness
- /varz for server health and runtime metrics
- /connz for client connection state
- /routez, /leafz, and /gatewayz for topology
- /subsz for subscription interest
- /jsz for JetStream persistence
Prometheus and Grafana are commonly used to turn this raw JSON into dashboards and alerts.
CLI tools like nats-top are invaluable for real-time incident response.

The NATS Monitoring Architecture

For security, the monitoring subsystem is disabled by default. These endpoints expose detailed information about your infrastructure, client behavior, and subject topology.

Once explicitly enabled, the server runs a lightweight HTTP(S) listener that serves JSON responses for each endpoint. Every endpoint is read-only and reflects the current state of the server at request time.

Unlike systems that depend on continuously running monitoring probes, NATS exposes telemetry directly from the server process itself.

Enabling the Monitoring Port

You can enable monitoring in three ways:

CLI flag

nats-server -m 8222

Configuration file (HTTP)

1
http_port: 8222

Configuration file (HTTPS – recommended)

1
https_port: 8222
2

3
tls {
4
  cert_file: "/path/to/server.crt"
5
  key_file: "/path/to/server.key"
6
}

Once enabled, you can access endpoints like:

1
https://localhost:8222/varz

Each endpoint follows the same pattern: append the endpoint name to the monitoring URL and parse the JSON response.

Important: NATS does not provide authentication or authorization for the monitoring HTTP server. Treat this port as internal-only.

Health Checks: /healthz

For basic liveness and readiness checks, use /healthz.

Returns 200 OK when the server is able to accept connections
Lightweight and stable
Ideal for Kubernetes liveness/readiness probes and load-balancer health checks

Use /healthz for “is this node alive?” Use /varz for “is this node healthy and behaving correctly?”

Core Health and Runtime Metrics: /varz

The /varz endpoint provides a high-level snapshot of server state, resource usage, and throughput. It’s the best place to start when diagnosing overall node health.

What to Monitor

Memory and CPU

mem reflects memory used by the NATS process
Memory growth may be caused by:
- Slow consumers
- JetStream caching
- Route or leaf buffers
- High subscription counts
cpu reports process CPU usage as observed by the Go runtime
- May exceed 100% on multi-core systems
- Does not represent total node or cgroup CPU

Slow Consumers

slow_consumers increments when the server detects clients that cannot drain outbound data fast enough
Indicates backpressure and sustained overload conditions
Behavior differs between:
- Core NATS
- JetStream push consumers
- JetStream pull consumers (less likely to trigger this condition)

Uptime

Useful for detecting silent restarts
Especially valuable in containerized environments

Throughput

in_msgs, out_msgs, in_bytes, out_bytes
Use deltas over time to calculate rates

Limits and Capacity

/varz includes both configured limits and current usage:

max_payload
max_connections
max_subscriptions

When limits are reached, new connections or subscriptions are rejected.

Managing Client State: /connz

The /connz endpoint exposes detailed information about every active client connection. This is your primary tool for identifying misbehaving clients and debugging traffic patterns.

What to Watch

Pending Bytes

Amount of data buffered waiting to be sent to a client
Sustained growth means the client is falling behind

Message Counts

msgs_to and msgs_from reveal high-volume producers and consumers
Useful for identifying noisy neighbors

Idle Time

Long-lived idle connections may indicate connection leaks or mismanaged pools

Filtering and Pagination

On busy servers, always use query parameters:

1
?limit=100&offset=0
2
?sort=pending
3
?subs=1
4
?state=closed

Recently closed connections include exit reasons, which is invaluable for diagnosing intermittent failures.

Cluster and Topology Monitoring

As deployments scale beyond a single node, topology visibility becomes essential.

Routes: /routez

Shows active routes between cluster nodes.

Key fields:

remote_id
pending_size
in_msgs, out_msgs

High pending sizes may indicate network congestion, slow peers, or asymmetric traffic patterns.

Leaf Nodes and Gateways

Additional topology endpoints:

/leafz – leaf node connections
/gatewayz – inter-cluster gateways

These are critical for multi-region and edge deployments.

Subscription Interest: /subsz

The /subsz endpoint exposes the internal subscription interest graph used for message routing.

Use cases:

Debug delivery issues
Inspect wildcard matching
Identify high fan-out subjects

Warning: Output can be very large. In production, prefer targeted CLI queries over full dumps.

JetStream Monitoring: /jsz

If you use JetStream, /jsz becomes a first-class monitoring endpoint.

Key Metrics

Reserved Resources

reserved_mem
reserved_store
Logical reservations based on stream configuration

Streams

Total number of active streams
Useful for capacity planning

API Errors

Often caused by misconfigured clients or quota violations

Account-Level Visibility

1
/jsz?accounts=1

Provides per-account resource usage for multi-tenant or shared environments.

Security Best Practices for Monitoring

Monitoring endpoints expose sensitive operational data and must never be public.

Recommended Protections

Network Isolation (Primary Defense)
- Bind to localhost or a private management network
- Restrict via firewalls or Kubernetes network policies
TLS
- Always use https_port in production
- Prevents eavesdropping and tampering
External Authentication (If Required)
- Place the monitoring port behind a reverse proxy that provides mTLS, Basic Auth, or OIDC
- NATS itself does not authenticate monitoring requests

Integrating with Observability Tooling

Prometheus and Grafana

The NATS Prometheus Exporter is the standard production integration. It:

Scrapes monitoring endpoints
Converts JSON into labeled metrics
Handles retries and connection reuse

If you prefer not to expose HTTP monitoring ports on each server, tools like Surveyor collect metrics over the NATS protocol instead, providing a single Prometheus endpoint for your entire cluster. Part 2 of this series covers this approach in detail.

Real-Time CLI Tools

For immediate troubleshooting, nats-top provides a live view of server activity:

nats-top -s https://localhost:8222

This is often the fastest way to answer: “What’s happening right now?”

Building Your Monitoring Strategy

Enable monitoring ports with TLS
Use /healthz for liveness checks
Monitor /varz for capacity and health
Deploy Prometheus for historical analysis
Use /connz during incidents
Reserve /subsz, /routez, /leafz, and /gatewayz for targeted debugging

NATS’ built-in monitoring endpoints give you accurate observability without relying on external monitoring probes. Whether you’re running a single node or a global supercluster, the same primitives apply—the difference is scale and aggregation.

Next: Event-Driven Monitoring with the System Account

HTTP monitoring endpoints are pull-based: you query them and receive a point-in-time snapshot. This works well for dashboards and trend analysis, but it has limitations. If a client connects and disconnects between scrapes, you may never see it.

In Part 2 of this series, we explore the NATS system account ($SYS)—a push-based, event-driven approach that delivers real-time advisories for connections, disconnections, authentication errors, and more. You’ll also learn how to query the same monitoring data covered here using the NATS protocol instead of HTTP.

Want granular, comprehensive NATS monitoring that’s up and running in minutes? Try Synadia Insights and get an entity-level view of your entire NATS system.

Need Help With NATS?

The team at Synadia are the creators and maintainers of NATS. If you need help architecting, monitoring, or scaling your deployment, get in touch.

FEATURED

RESOURCES

Comparisons