Waiting critical means a JetStream pull consumer has more outstanding pull requests than the operator-defined threshold. Each pull request represents a consumer instance asking the server for messages that aren’t available yet. A high num_waiting count indicates that consumer demand far exceeds the message supply — many consumers are parked, waiting for work that isn’t arriving.
In a pull-based consumer model, clients send pull requests to the server, which responds with available messages or holds the request until messages arrive (long polling). The num_waiting metric counts how many pull requests are currently queued and waiting for messages.
A high waiting count signals a fundamental demand-supply mismatch. More consumer instances are polling for messages than the stream can feed. Each waiting pull request consumes server memory and a slot in the consumer’s max_waiting limit (default 512). When max_waiting is exhausted, new pull requests are rejected with a “max waiting exceeded” error, causing client-side retry loops that add network overhead without delivering any messages.
The operational cost is real even though no data is at risk. Over-provisioned consumers waste compute resources — each consumer instance uses CPU, memory, and network connections while doing no useful work. In cloud environments, this translates directly to unnecessary cost. In latency-sensitive systems, the burst of pull requests when messages finally arrive can cause a thundering herd: hundreds of waiting consumers all receive messages simultaneously, creating a processing spike that strains downstream dependencies.
Beyond resource waste, a high num_waiting count can also mask a different problem. If messages should be flowing but aren’t, the waiting count is a symptom of an upstream publishing failure, a subject filter mismatch, or a stream configuration error. Operators who dismiss the high waiting count as “just consumers being eager” may miss the fact that the data pipeline is broken.
Over-provisioned consumer instances. More consumer instances are running than the message rate requires. Common in auto-scaling environments where the consumer pool scaled up during a traffic spike and didn’t scale back down. Ten instances pulling from a consumer that receives one message per minute means nine instances are perpetually waiting.
Message rate dropped below consumer capacity. The stream’s publish rate decreased — perhaps the upstream producer slowed down, a batch job completed, or traffic naturally declined — but the consumer pool size wasn’t adjusted. The consumers keep polling, but there’s nothing to fetch.
Subject filter mismatch. The consumer has a subject filter that doesn’t match what’s actually being published. Messages are flowing into the stream on orders.us.> but the consumer is filtered to orders.eu.>. The consumer keeps pulling, the server keeps returning empty results, and num_waiting climbs.
Aggressive pull request batching with short expiry. Clients configured with very short pull request timeouts (e.g., 1-second expires) send frequent pull requests that stack up in the waiting queue. Even moderate consumer counts can push num_waiting high when each client sends 10+ pull requests per second.
Consumer configured on an inactive stream. The stream exists but hasn’t received messages in hours or days. The consumer was deployed in anticipation of traffic that hasn’t materialized, or the upstream publisher was decommissioned without cleaning up downstream consumers.
max_waiting set too high. A very large max_waiting value (e.g., 10,000) allows an unreasonable number of pull requests to queue. The default of 512 is already generous for most workloads. Setting it higher masks over-provisioning rather than addressing it.
nats consumer info ORDERS my-consumer --json | jq '{ num_waiting: .num_waiting, num_pending: .num_pending, num_ack_pending: .num_ack_pending, config_max_waiting: .config.max_waiting}'If num_waiting is near max_waiting, pull requests are likely being rejected.
# Check if the stream is receiving messagesnats stream info ORDERS --json | jq '{ messages: .state.messages, last_seq: .state.last_seq, first_ts: .state.first_ts, last_ts: .state.last_ts}'If last_ts is old (hours or days ago), the stream isn’t receiving new messages — the consumer is waiting for nothing.
# Compare consumer filter against stream subjectsnats consumer info ORDERS my-consumer --json | jq '.config.filter_subject'nats stream info ORDERS --json | jq '.config.subjects'Ensure the consumer’s filter subject is a subset of or matches the stream’s configured subjects.
watch -n 5 'nats consumer info ORDERS my-consumer --json | jq "{waiting: .num_waiting, pending: .num_pending, ack_pending: .num_ack_pending}"'A stable high num_waiting with zero num_pending and zero num_ack_pending confirms the demand-supply mismatch.
1import (2 "fmt"3 "github.com/nats-io/nats.go"4)5
6func checkWaitingCritical(js nats.JetStreamContext, streamName string, threshold int) error {7 for consumer := range js.ConsumerNames(streamName) {8 info, err := js.ConsumerInfo(streamName, consumer)9 if err != nil {10 continue11 }12 if info.NumWaiting > threshold {13 pct := float64(info.NumWaiting) / float64(info.Config.MaxWaiting) * 10014 fmt.Printf("CRITICAL: stream=%s consumer=%s waiting=%d max=%d (%.1f%%) pending=%d\n",15 streamName, consumer, info.NumWaiting,16 info.Config.MaxWaiting, pct, info.NumPending)17 }18 }19 return nil20}1import asyncio2import nats3
4async def check_waiting_critical(stream_name: str, threshold: int):5 nc = await nats.connect()6 js = nc.jetstream()7
8 async for consumer_name in js.consumer_names(stream_name):9 info = await js.consumer_info(stream_name, consumer_name)10 if info.num_waiting > threshold:11 pct = (info.num_waiting / info.config.max_waiting) * 10012 print(f"CRITICAL: stream={stream_name} consumer={consumer_name} "13 f"waiting={info.num_waiting} max={info.config.max_waiting} "14 f"({pct:.1f}%) pending={info.num_pending}")15
16 await nc.close()17
18asyncio.run(check_waiting_critical("ORDERS", 100))If the root cause is over-provisioning, scale down the consumer pool:
# For Kubernetes deploymentskubectl scale deployment order-consumer --replicas=2Match the consumer instance count to the actual message rate. A good heuristic: each instance should process at least one message per pull request cycle. If an instance is idle more than 80% of the time, you have too many instances.
If the consumer’s filter doesn’t match published subjects, update it:
nats consumer edit ORDERS my-consumer --filter "orders.us.>"Or, if the consumer should receive all messages:
nats consumer edit ORDERS my-consumer --filter ""Increase the pull request expiry. Longer expiry reduces the frequency of new pull requests, lowering the waiting count:
1// Instead of short, aggressive pulls2msgs, _ := sub.Fetch(10, nats.MaxWait(1*time.Second)) // creates many waiting requests3
4// Use longer poll intervals5msgs, _ := sub.Fetch(10, nats.MaxWait(30*time.Second)) // fewer waiting requests1# Longer wait reduces request frequency2msgs = await sub.fetch(10, timeout=30)Use heartbeat-based pulls. Modern NATS client libraries support idle heartbeats on pull requests, which keep a single long-lived pull request alive rather than creating many short-lived ones:
1sub, _ := js.PullSubscribe("orders.>", "my-consumer")2msgs, _ := sub.Fetch(100,3 nats.MaxWait(60*time.Second),4 nats.PullHeartbeat(5*time.Second),5)If the current max_waiting is unnecessarily high, reduce it to match your actual consumer count:
nats consumer edit ORDERS my-consumer --max-waiting 50Set max_waiting to roughly 2x the expected number of concurrent consumer instances. This provides headroom for transient pull request overlap without allowing unbounded queue growth.
If the consumer is no longer needed (upstream publisher decommissioned, workload migrated), remove it:
nats consumer rm ORDERS my-consumer -fSynadia Insights flags consumers that combine high num_waiting with zero throughput over extended periods, helping you identify candidates for removal.
Primarily wasteful. High num_waiting doesn’t cause data loss or message corruption. The risks are resource waste (server memory, client compute), potential thundering herd when messages do arrive, and masking upstream problems. It’s an efficiency and operational clarity issue.
Set the io.nats.monitor.waiting-critical metadata key on the stream or consumer configuration. Choose a value based on your expected consumer instance count: if you run 5 instances, a num_waiting of 50 means each instance has 10 queued pulls, which is likely excessive. A threshold of 2-3x your instance count is a reasonable starting point.
New pull requests are rejected by the server with a “max waiting requests exceeded” error. Well-behaved clients retry with backoff, but poorly configured clients may retry aggressively, creating a tight loop of rejected requests that wastes network and CPU. If you’re seeing this error, either reduce the number of consumer instances or increase max_waiting.
No. Push consumers don’t use pull requests — the server pushes messages directly to the client. num_waiting is only relevant for pull consumers. Push consumers have different health indicators like num_ack_pending (CONSUMER_006) and num_pending (CONSUMER_008).
In theory, yes — you could only pull after receiving an advisory or notification that messages are available. In practice, long-polling (pull with a reasonable timeout) is the standard pattern. A small num_waiting (1 per consumer instance) is normal and expected. The concern is when num_waiting grows far beyond the number of active instances.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community