NATS JetStream API Pending High: What It Means and How to Fix It

JetStream API pending high means the number of inflight JetStream API requests on a server has exceeded the configured threshold. These requests are queued waiting for the meta leader to process them, and a growing backlog signals that the meta leader is overwhelmed, slow, or temporarily unreachable.

Why this matters

Every JetStream API operation — creating a stream, adding a consumer, looking up stream info — routes through the meta leader for coordination. The meta leader serializes these operations through the Raft consensus layer. When inflight requests pile up, it means the leader is processing operations more slowly than clients are submitting them. The result is rising latency on every JetStream API call across the entire cluster.

The problem compounds. As API latency increases, clients that use synchronous JetStream operations start blocking longer. Applications waiting on StreamInfo or ConsumerCreate calls queue up their own internal work. If clients implement retries (and most do), the retry traffic further inflates the pending queue — a classic feedback loop where backpressure creates more pressure.

In severe cases, pending requests time out and return errors to clients. This can cascade into application-level failures: deployment scripts that create streams abort, autoscaling logic that provisions consumers fails, and monitoring tools that poll JetStream metadata stop reporting. The cluster is technically healthy — all servers are up, all streams are replicated — but the control plane is congested, making it impossible to manage JetStream resources.

Common causes

Burst of concurrent API calls during deployment. A deployment script or operator tool creates or updates dozens of streams and consumers simultaneously. Each operation is an independent API call that queues at the meta leader. A rolling deploy that touches 50 streams in parallel can spike the pending queue from zero to hundreds in seconds.
Monitoring tools polling too aggressively. Observability systems that call StreamInfo or ConsumerInfo on every stream and consumer every few seconds generate sustained API load. With hundreds of streams, a 10-second polling interval means dozens of concurrent inflight requests at any given moment.
Application creating consumers in tight loops. Code that creates an ephemeral consumer per request — or recreates consumers on every reconnection without checking if they already exist — generates a continuous stream of API operations. This is especially common in microservice architectures where each instance manages its own consumers.
Meta leader on under-resourced hardware. The meta leader must write Raft log entries to disk for every API operation. If the leader runs on a node with slow disk I/O, high CPU contention, or insufficient memory, its processing throughput drops and the pending queue grows even under normal API load.
Large number of Raft groups. Every replicated stream and consumer is a separate Raft group. The meta leader coordinates across all of them. Clusters with thousands of Raft groups have inherently higher coordination overhead, leaving less headroom for API request processing.
Meta leader election in progress. During a leader election, there is briefly no leader to process requests. Requests submitted during this window queue until the new leader is elected and ready. If elections are frequent (see META_003), the pending queue spikes repeatedly.

How to diagnose

Check current API pending count

Query the JetStream subsystem status to see inflight API requests:

nats server report jetstream

Look at the API column group. The report shows total requests and errors per server. To get the raw pending count, query the meta leader directly:

curl -s http://localhost:8222/jsz | jq '.api'

This returns total, errors, and inflight — the inflight value is what this check evaluates. A sustained value above the threshold (default 1,000) triggers the alert.

Identify the source of API traffic

Watch JetStream advisory events to see which operations are generating load:

nats event --js-advisory

This shows real-time stream and consumer creation, deletion, and modification events. Look for patterns: a single client creating many consumers, a monitoring tool making repeated info calls, or a deployment script running bulk operations.

Check meta leader performance

If the pending queue is high but API request rate is moderate, the bottleneck is the leader’s processing speed:

nats server report jetstream

Identify which server is the meta leader from the meta group section. Then check that server’s resource utilization — CPU, memory, and especially disk I/O latency. Slow disk writes directly throttle Raft commit speed.

Correlate with other checks

High API pending often co-occurs with other conditions:

JETSTREAM_004 (JS API Request Rate High) — the inbound rate itself is too high
META_008 (Meta Pending High) — Raft-level pending on the meta group
META_003 (Meta Leader Flapping) — leader instability disrupts processing

How to fix it

Immediate: reduce inflight pressure

Throttle API-heavy operations. If a deployment or migration script is flooding the API, add serialization or rate limiting. Process operations one at a time or in small batches rather than firing them all concurrently:

1
// Instead of launching all creates concurrently:
2
for _, cfg := range streamConfigs {
3
    js.AddStream(cfg) // sequential — one at a time
4
}

1
# Python (nats.py) — serialize stream creation
2
for cfg in stream_configs:
3
    await js.add_stream(cfg)
4
    # Optionally add a small delay between operations

Step down the meta leader. If the current leader is on a resource-constrained node, force an election to move leadership to a more capable server:

nats server cluster step-down

This triggers a new meta leader election. The new leader may have better hardware or lower load, improving processing throughput.

Short-term: reduce API call volume

Cache JetStream metadata in your application. Instead of calling StreamInfo or ConsumerInfo before every operation, cache the result and refresh on a reasonable interval (every 30–60 seconds, not every request). Most stream metadata changes infrequently.

Eliminate redundant consumer creation. If your application creates durable consumers, check whether the consumer already exists before calling create. Most client libraries support CreateOrUpdate semantics that skip the creation if the consumer already exists with the same configuration:

1
// Go — use CreateOrUpdateConsumer to avoid redundant creates
2
cons, err := js.CreateOrUpdateConsumer(ctx, "ORDERS", jetstream.ConsumerConfig{
3
    Durable:   "order-processor",
4
    AckPolicy: jetstream.AckExplicitPolicy,
5
})

1
# Python (nats.py)
2
from nats.js.api import ConsumerConfig
3
config = ConsumerConfig(durable_name="order-processor", ack_policy="explicit")
4
await js.subscribe("ORDERS.>", config=config)

Reduce monitoring poll frequency. If your observability stack queries JetStream metadata, increase the polling interval. For most operational dashboards, a 30-second or 60-second interval provides sufficient visibility without creating sustained API pressure.

Long-term: scale the control plane

Ensure the meta leader runs on fast hardware. The meta leader’s throughput is bounded by disk write latency (for Raft log commits) and CPU (for request processing). Servers intended to host the meta leader should have NVMe or SSD storage and adequate CPU headroom. Use placement tags to influence which servers are eligible for leadership.

Reduce total Raft group count. Every replicated stream and consumer is a Raft group. Consolidate small streams into fewer, larger streams using subject partitioning. Instead of one stream per entity, use a single stream with subject-based filtering. Fewer Raft groups means less coordination overhead on the meta leader.

Set JetStream resource limits per account. Prevent any single account from creating unbounded streams and consumers. Account-level limits (max_streams, max_consumers) cap the total number of Raft groups an account can create, protecting the meta leader from runaway growth.

Frequently asked questions

What is the JetStream API pending queue?

The pending queue holds JetStream API requests that have been received by the server but not yet processed by the meta leader. Every operation that modifies or queries JetStream state — stream creation, consumer updates, info lookups — enters this queue. The meta leader processes requests sequentially through Raft consensus, so the queue depth reflects how far behind the leader is from incoming demand.

How is JS API pending different from meta pending?

JS API pending (JETSTREAM_005) measures inflight API requests at the server level — requests waiting to be forwarded to and processed by the meta leader. Meta pending (META_008) measures pending Raft operations within the meta group itself. They’re related but distinct: API pending can be high because the meta leader is slow (meta pending is also high) or because network latency between the server and the meta leader is delaying forwarding (meta pending may be normal).

Will high API pending cause message loss?

No. JetStream API pending affects the control plane — stream and consumer management operations — not the data plane. Message publishing and consumption use separate code paths. However, if API pending causes consumer creation to fail or time out, applications that depend on dynamically provisioned consumers will fail to receive messages until the consumer is successfully created.

What pending threshold should I set?

The default threshold of 1,000 inflight requests is appropriate for most deployments. Lower it (to 100–500) if your cluster has limited hardware and you want earlier warning. Raise it only if you have a legitimate burst pattern (large deployments) and the meta leader has the hardware to process the backlog within a few seconds.

Can I see which specific API calls are pending?

Not directly from the server metrics. The /jsz endpoint reports aggregate counts (inflight), not individual requests. To identify the source, watch JetStream advisory events (nats event --js-advisory) and correlate timestamps. The advisory stream shows which operations are being submitted, which helps you identify the client or pattern generating the load.

FEATURED

RESOURCES

Comparisons