NATS JetStream API Request Rate High: What It Means and How to Fix It

A high JetStream API request rate means the cluster is processing more than 1,000 JetStream management API calls per collection epoch. These are control-plane operations — creating, deleting, and querying streams and consumers — not data-plane publishes or subscribes. Excessive API load adds CPU overhead to the meta leader and slows down all JetStream management operations cluster-wide.

Why this matters

The JetStream API is a shared, serialized resource. Every stream creation, consumer creation, stream info query, and consumer info query routes through the meta group leader via Raft consensus. Unlike data-plane operations (publish, subscribe, ack) which scale horizontally across servers, API operations funnel through a single node. The meta leader processes them sequentially, and each operation requires Raft consensus for writes.

When the API request rate is high, the meta leader spends CPU time processing management requests instead of handling data replication and Raft heartbeats. Response latency for legitimate operations increases — a nats stream info call that normally returns in milliseconds starts taking seconds. In severe cases, the meta leader falls behind on Raft heartbeats, triggering a leader election. If the new leader inherits the same API load, it too falls behind, creating a flapping cycle where the meta group can’t maintain stable leadership.

The blast radius is cluster-wide. Even if only one application is hammering the API, every JetStream operation across every account and every stream is affected. A monitoring system polling stream info every second for 200 streams generates 200 API requests per second — enough to measurably impact a busy cluster. CI/CD pipelines that recreate streams and consumers on every deployment add burst load. Client libraries that create ephemeral consumers per request add sustained load. Each source seems reasonable in isolation, but the aggregate overwhelms the control plane.

Common causes

Ephemeral consumer per request. An application creates a new ephemeral consumer for each incoming request, processes one message, then lets the consumer be garbage collected. Each creation and deletion is an API call. At 500 requests per second, that’s 1,000 API calls per second just from consumer lifecycle overhead.
Monitoring polling too frequently. A monitoring system or dashboard queries stream info or consumer info for every stream and consumer at a short interval. With 100 streams and 200 consumers, a 5-second poll interval generates 60 API requests per second — over 3,600 per minute.
CI/CD pipeline recreating streams. Deployment pipelines that delete and recreate streams or consumers on every deploy generate bursts of API calls. If deploys happen frequently (multiple times per hour across many services), the aggregate load is significant.
Client library auto-creating consumers on connect. Some application patterns create a new durable consumer every time a service instance starts, without checking if one already exists. During a rolling restart of 50 instances, this generates 50 consumer creation API calls in rapid succession — plus 50 more if the old consumers are also being cleaned up.
Rapid stream/consumer info lookups. Application code that calls StreamInfo() or ConsumerInfo() in a hot loop — perhaps to check stream state before every publish or to poll for consumer lag — generates sustained API traffic proportional to the application’s throughput.
Consumer churn from unstable subscribers. Subscribers that connect, create a consumer, disconnect, and reconnect in a cycle generate continuous consumer creation and deletion API calls. This often co-occurs with JETSTREAM_006 (Consumer Churn High).

How to diagnose

Check the API request rate

nats server report jetstream

Look at the API Req and API Err columns. The API Req column shows the total API request count per server. The meta leader handles the majority of these. Compare the rate across collection intervals to determine the request rate.

Query the monitoring endpoint directly

curl -s http://localhost:8222/jsz | jq '{api: .api}'

This returns total (lifetime request count) and errors (lifetime error count). Sample this endpoint at two points in time and compute the delta to get the rate.

Watch API advisories in real time

nats event --js-advisory

This subscribes to JetStream advisory events, which include API calls. Watch for patterns: rapid consumer creation/deletion, repeated info queries, or batch stream operations. The advisory includes the account and client information, helping you trace the source.

Identify the source by account

If you have multiple accounts, determine which one is generating the load:

curl -s http://localhost:8222/jsz?accounts=true | jq '.account_details[] | {account: .name, api_total: .api.total, api_errors: .api.errors}'

Check for consumer churn

High API rates often correlate with consumer churn:

nats server report jetstream

If the consumer count is fluctuating significantly between observations, consumer creation/deletion is likely a major contributor to API load.

How to fix it

Immediate: reduce the polling frequency

If monitoring is the cause, increase the poll interval. Most operational metrics don’t change meaningfully in under 30 seconds:

# Instead of polling every 5 seconds, poll every 60 seconds
# In your monitoring configuration:
# scrape_interval: 60s  (Prometheus example)

If a dashboard is making direct API calls, cache the responses and serve from cache with a TTL.

Short-term: fix the application patterns

Use durable consumers instead of ephemeral per-request consumers. A durable consumer persists across restarts and reconnections. Create it once during deployment, then bind to it on each request:

1
// Create durable consumer once (idempotent if it already exists)
2
_, err := js.AddConsumer("ORDERS", &nats.ConsumerConfig{
3
    Durable:   "order-processor",
4
    AckPolicy: nats.AckExplicitPolicy,
5
})
6

7
// Bind to existing consumer — no API call to create
8
sub, err := js.PullSubscribe("orders.>", "order-processor",
9
    nats.Bind("ORDERS", "order-processor"),
10
)

1
# Python — create once, bind on each connection
2
await js.add_consumer(
3
    "ORDERS",
4
    ConsumerConfig(durable_name="order-processor", ack_policy=AckPolicy.EXPLICIT),
5
)
6

7
# Subscribe by binding — no creation API call
8
sub = await js.pull_subscribe("orders.>", "order-processor", stream="ORDERS")

Cache stream and consumer info client-side. The server queues API requests internally and publishes an advisory when the queue saturates, so reducing concurrent API call volume is critical. If your application checks stream state before publishing, cache the result with a reasonable TTL instead of querying on every publish:

1
// Cache stream info for 30 seconds
2
var (
3
    cachedInfo *nats.StreamInfo
4
    cachedAt   time.Time
5
)
6

7
func getStreamInfo(js nats.JetStreamContext, name string) (*nats.StreamInfo, error) {
8
    if time.Since(cachedAt) < 30*time.Second && cachedInfo != nil {
9
        return cachedInfo, nil
10
    }
11
    info, err := js.StreamInfo(name)
12
    if err != nil {
13
        return nil, err
14
    }
15
    cachedInfo = info
16
    cachedAt = time.Now()
17
    return info, nil
18
}

Make CI/CD pipelines idempotent. Instead of deleting and recreating streams, use update operations that only modify what changed:

# Check if stream exists, create only if it doesn't
nats stream info ORDERS 2>/dev/null || nats stream add ORDERS \
    --subjects "orders.>" \
    --max-age 168h \
    --max-bytes 100G \
    --replicas 3

Long-term: architect for minimal API usage

Separate data-plane from control-plane operations. Stream and consumer creation belongs in deployment pipelines, not in application hot paths. Application code should only publish, subscribe, and acknowledge — never create or query resources during normal operation.

Use server-side consumer push delivery where appropriate. Push consumers deliver messages to a subject without the client needing to make API calls to fetch batches. For workloads where push semantics are acceptable, this eliminates the pull-fetch API overhead entirely.

Monitor API rate as a first-class metric. Track the delta of api.total from /jsz over time and alert when it exceeds your baseline.

Synadia Insights evaluates the API request rate every collection epoch, flagging when the rate exceeds 1,000 requests so you can trace the source and fix it before it destabilizes the meta leader.

Frequently asked questions

No. The JetStream API counter tracks control-plane operations only: stream and consumer CRUD (create, read, update, delete), stream purge, and info queries. Data-plane operations — publishing messages, acknowledging messages, and subscribing to subjects — use the core NATS protocol and do not count against the API rate. A high-throughput stream with millions of publishes per second does not contribute to the API request count.

Does the API request rate affect data-plane performance?

Indirectly, yes. The meta leader processes API requests on the same server that handles data replication for streams it leads. If the meta leader is saturated with API requests, its Raft heartbeats and replication may slow down, causing follower lag on streams led by that server. In extreme cases, the meta leader steps down due to missed heartbeats, and the resulting election disrupts all JetStream operations briefly.

How do I find which client is making the most API calls?

Use nats event --js-advisory to watch API advisories in real time. Each advisory includes the client connection information (client name, account, connection ID). Correlate frequent advisories to specific clients. The /jsz endpoint with accounts=true also breaks down API totals by account, which narrows the search.

Is there a way to rate-limit JetStream API calls per account?

There is no built-in per-account API rate limit in the NATS server. The API rate is a global resource. The fix is on the client side: reduce unnecessary API calls through caching, durable consumers, and idempotent provisioning. Account-level JetStream limits (max streams, max consumers) indirectly cap the creation side, but don’t limit info queries.

What’s the relationship between API request rate and API pending?

API request rate (JETSTREAM_004) measures throughput — how many requests arrive per epoch. API pending (JETSTREAM_005) measures concurrency — how many requests are in flight at a single point in time. High request rate with low pending means the meta leader is keeping up. High request rate with high pending means the meta leader is falling behind and a backlog is forming. Both checks firing simultaneously indicates the control plane is under serious pressure.

FEATURED

RESOURCES

Comparisons

NATS JetStream API Request Rate High: What It Means and How to Fix It

Why this matters

Common causes

How to diagnose

Check the API request rate

Query the monitoring endpoint directly

Watch API advisories in real time

Identify the source by account

Check for consumer churn

How to fix it

Immediate: reduce the polling frequency

Short-term: fix the application patterns

Long-term: architect for minimal API usage

Frequently asked questions

Does the API request rate affect data-plane performance?

How do I find which client is making the most API calls?

Is there a way to rate-limit JetStream API calls per account?

What’s the relationship between API request rate and API pending?

Related checks

Proactive monitoring for NATS js api request rate high with Synadia Insights

FEATURED

RESOURCES

Comparisons

NATS JetStream API Request Rate High: What It Means and How to Fix It

Why this matters

Common causes

How to diagnose

Check the API request rate

Query the monitoring endpoint directly

Watch API advisories in real time

Identify the source by account

Check for consumer churn

How to fix it

Immediate: reduce the polling frequency

Short-term: fix the application patterns

Long-term: architect for minimal API usage

Frequently asked questions

Are publish and subscribe operations counted as JetStream API requests?

Does the API request rate affect data-plane performance?

How do I find which client is making the most API calls?

Is there a way to rate-limit JetStream API calls per account?

What’s the relationship between API request rate and API pending?

Related checks

Proactive monitoring for NATS js api request rate high with Synadia Insights