A high JetStream API request rate means the cluster is processing more than 1,000 JetStream management API calls per collection epoch. These are control-plane operations — creating, deleting, and querying streams and consumers — not data-plane publishes or subscribes. Excessive API load adds CPU overhead to the meta leader and slows down all JetStream management operations cluster-wide.
The JetStream API is a shared, serialized resource. Every stream creation, consumer creation, stream info query, and consumer info query routes through the meta group leader via Raft consensus. Unlike data-plane operations (publish, subscribe, ack) which scale horizontally across servers, API operations funnel through a single node. The meta leader processes them sequentially, and each operation requires Raft consensus for writes.
When the API request rate is high, the meta leader spends CPU time processing management requests instead of handling data replication and Raft heartbeats. Response latency for legitimate operations increases — a nats stream info call that normally returns in milliseconds starts taking seconds. In severe cases, the meta leader falls behind on Raft heartbeats, triggering a leader election. If the new leader inherits the same API load, it too falls behind, creating a flapping cycle where the meta group can’t maintain stable leadership.
The blast radius is cluster-wide. Even if only one application is hammering the API, every JetStream operation across every account and every stream is affected. A monitoring system polling stream info every second for 200 streams generates 200 API requests per second — enough to measurably impact a busy cluster. CI/CD pipelines that recreate streams and consumers on every deployment add burst load. Client libraries that create ephemeral consumers per request add sustained load. Each source seems reasonable in isolation, but the aggregate overwhelms the control plane.
Ephemeral consumer per request. An application creates a new ephemeral consumer for each incoming request, processes one message, then lets the consumer be garbage collected. Each creation and deletion is an API call. At 500 requests per second, that’s 1,000 API calls per second just from consumer lifecycle overhead.
Monitoring polling too frequently. A monitoring system or dashboard queries stream info or consumer info for every stream and consumer at a short interval. With 100 streams and 200 consumers, a 5-second poll interval generates 60 API requests per second — over 3,600 per minute.
CI/CD pipeline recreating streams. Deployment pipelines that delete and recreate streams or consumers on every deploy generate bursts of API calls. If deploys happen frequently (multiple times per hour across many services), the aggregate load is significant.
Client library auto-creating consumers on connect. Some application patterns create a new durable consumer every time a service instance starts, without checking if one already exists. During a rolling restart of 50 instances, this generates 50 consumer creation API calls in rapid succession — plus 50 more if the old consumers are also being cleaned up.
Rapid stream/consumer info lookups. Application code that calls StreamInfo() or ConsumerInfo() in a hot loop — perhaps to check stream state before every publish or to poll for consumer lag — generates sustained API traffic proportional to the application’s throughput.
Consumer churn from unstable subscribers. Subscribers that connect, create a consumer, disconnect, and reconnect in a cycle generate continuous consumer creation and deletion API calls. This often co-occurs with JETSTREAM_006 (Consumer Churn High).
nats server report jetstreamLook at the API Req and API Err columns. The API Req column shows the total API request count per server. The meta leader handles the majority of these. Compare the rate across collection intervals to determine the request rate.
curl -s http://localhost:8222/jsz | jq '{api: .api}'This returns total (lifetime request count) and errors (lifetime error count). Sample this endpoint at two points in time and compute the delta to get the rate.
nats event --js-advisoryThis subscribes to JetStream advisory events, which include API calls. Watch for patterns: rapid consumer creation/deletion, repeated info queries, or batch stream operations. The advisory includes the account and client information, helping you trace the source.
If you have multiple accounts, determine which one is generating the load:
curl -s http://localhost:8222/jsz?accounts=true | jq '.account_details[] | {account: .name, api_total: .api.total, api_errors: .api.errors}'High API rates often correlate with consumer churn:
nats server report jetstreamIf the consumer count is fluctuating significantly between observations, consumer creation/deletion is likely a major contributor to API load.
If monitoring is the cause, increase the poll interval. Most operational metrics don’t change meaningfully in under 30 seconds:
# Instead of polling every 5 seconds, poll every 60 seconds# In your monitoring configuration:# scrape_interval: 60s (Prometheus example)If a dashboard is making direct API calls, cache the responses and serve from cache with a TTL.
Use durable consumers instead of ephemeral per-request consumers. A durable consumer persists across restarts and reconnections. Create it once during deployment, then bind to it on each request:
1// Create durable consumer once (idempotent if it already exists)2_, err := js.AddConsumer("ORDERS", &nats.ConsumerConfig{3 Durable: "order-processor",4 AckPolicy: nats.AckExplicitPolicy,5})6
7// Bind to existing consumer — no API call to create8sub, err := js.PullSubscribe("orders.>", "order-processor",9 nats.Bind("ORDERS", "order-processor"),10)1# Python — create once, bind on each connection2await js.add_consumer(3 "ORDERS",4 ConsumerConfig(durable_name="order-processor", ack_policy=AckPolicy.EXPLICIT),5)6
7# Subscribe by binding — no creation API call8sub = await js.pull_subscribe("orders.>", "order-processor", stream="ORDERS")Cache stream and consumer info client-side. The server queues API requests internally and publishes an advisory when the queue saturates, so reducing concurrent API call volume is critical. If your application checks stream state before publishing, cache the result with a reasonable TTL instead of querying on every publish:
1// Cache stream info for 30 seconds2var (3 cachedInfo *nats.StreamInfo4 cachedAt time.Time5)6
7func getStreamInfo(js nats.JetStreamContext, name string) (*nats.StreamInfo, error) {8 if time.Since(cachedAt) < 30*time.Second && cachedInfo != nil {9 return cachedInfo, nil10 }11 info, err := js.StreamInfo(name)12 if err != nil {13 return nil, err14 }15 cachedInfo = info16 cachedAt = time.Now()17 return info, nil18}Make CI/CD pipelines idempotent. Instead of deleting and recreating streams, use update operations that only modify what changed:
# Check if stream exists, create only if it doesn'tnats stream info ORDERS 2>/dev/null || nats stream add ORDERS \ --subjects "orders.>" \ --max-age 168h \ --max-bytes 100G \ --replicas 3Separate data-plane from control-plane operations. Stream and consumer creation belongs in deployment pipelines, not in application hot paths. Application code should only publish, subscribe, and acknowledge — never create or query resources during normal operation.
Use server-side consumer push delivery where appropriate. Push consumers deliver messages to a subject without the client needing to make API calls to fetch batches. For workloads where push semantics are acceptable, this eliminates the pull-fetch API overhead entirely.
Monitor API rate as a first-class metric. Track the delta of api.total from /jsz over time and alert when it exceeds your baseline.
Synadia Insights evaluates the API request rate every collection epoch, flagging when the rate exceeds 1,000 requests so you can trace the source and fix it before it destabilizes the meta leader.
No. The JetStream API counter tracks control-plane operations only: stream and consumer CRUD (create, read, update, delete), stream purge, and info queries. Data-plane operations — publishing messages, acknowledging messages, and subscribing to subjects — use the core NATS protocol and do not count against the API rate. A high-throughput stream with millions of publishes per second does not contribute to the API request count.
Indirectly, yes. The meta leader processes API requests on the same server that handles data replication for streams it leads. If the meta leader is saturated with API requests, its Raft heartbeats and replication may slow down, causing follower lag on streams led by that server. In extreme cases, the meta leader steps down due to missed heartbeats, and the resulting election disrupts all JetStream operations briefly.
Use nats event --js-advisory to watch API advisories in real time. Each advisory includes the client connection information (client name, account, connection ID). Correlate frequent advisories to specific clients. The /jsz endpoint with accounts=true also breaks down API totals by account, which narrows the search.
There is no built-in per-account API rate limit in the NATS server. The API rate is a global resource. The fix is on the client side: reduce unnecessary API calls through caching, durable consumers, and idempotent provisioning. Account-level JetStream limits (max streams, max consumers) indirectly cap the creation side, but don’t limit info queries.
API request rate (JETSTREAM_004) measures throughput — how many requests arrive per epoch. API pending (JETSTREAM_005) measures concurrency — how many requests are in flight at a single point in time. High request rate with low pending means the meta leader is keeping up. High request rate with high pending means the meta leader is falling behind and a backlog is forming. Both checks firing simultaneously indicates the control plane is under serious pressure.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community