Unprocessed critical means a JetStream consumer has more undelivered messages waiting in its backlog than the operator-defined threshold. The num_pending metric represents messages in the stream that match the consumer’s filter but haven’t been delivered yet — the consumer’s to-do list. When this count crosses the critical threshold, the consumer is falling behind and the gap between published messages and processed messages is growing.
A growing num_pending backlog is the most direct indicator that a consumer can’t keep up with its workload. Unlike num_ack_pending (messages delivered but unacknowledged) or num_waiting (idle pull requests), num_pending measures the fundamental throughput deficit: messages are arriving faster than the consumer processes them.
The consequences depend on the stream’s retention configuration. With a limits-based retention policy and max_msgs or max_bytes constraints, the oldest messages in the stream are evicted as new ones arrive. If the consumer’s backlog grows large enough, unprocessed messages may be evicted before the consumer ever delivers them — permanent, silent data loss. The consumer’s num_pending count drops (because the messages no longer exist), which can paradoxically make the situation look like it’s improving when messages are actually being discarded.
Even without retention limits, a large backlog creates operational pressure. Downstream systems that depend on timely processing — dashboards, search indexes, notification pipelines, billing systems — become increasingly stale. The age of the oldest unprocessed message grows from seconds to minutes to hours. In some systems, messages older than a certain age are worthless (real-time pricing, fraud detection, session tracking), so a backlog effectively equals data loss even if the messages technically still exist.
At critical levels, the backlog also affects recovery time. If a consumer falls 10 million messages behind, even processing at 10,000 msg/s takes over 16 minutes to catch up — assuming no new messages arrive during recovery. With continued inflow, catch-up takes even longer or may never complete if the processing rate is only marginally above the publish rate.
Consumer processing too slow. The most common cause. The consumer’s message processing throughput is lower than the stream’s inbound message rate. Each second, the gap between published and processed messages widens. This can be due to slow downstream dependencies, CPU-bound processing, or blocking I/O in the message handler.
Consumer instances disconnected or crashed. One or more consumer instances stopped processing — a crash, a deployment rollback, an OOM kill, or a network partition. The remaining instances (if any) can’t absorb the full load, so the backlog grows. If all instances are down, num_pending grows at the full publish rate.
Traffic spike without consumer scaling. The publish rate increased significantly — a batch import, a marketing event, a partner integration going live — but the consumer pool wasn’t scaled up to match. The consumer was sized for steady-state traffic and can’t handle the burst.
Consumer back-pressured by max_ack_pending. The consumer hit its max_ack_pending limit, so the server stopped delivering new messages. Messages continue flowing into the stream but can’t be delivered, causing num_pending to grow. The root cause is downstream (see CONSUMER_006), but the visible symptom is growing num_pending.
Consumer paused or deliberately throttled. A pull consumer that isn’t actively pulling, or a push consumer with a pause_until timestamp in the future, won’t receive messages. The backlog accumulates until the consumer resumes.
Subject filter too broad. The consumer’s subject filter matches more subjects than intended, pulling in messages from high-volume subjects that weren’t designed for this consumer’s processing pipeline. The aggregate rate exceeds what the consumer was provisioned to handle.
# Read the consumer's delivered position; the stream's last sequence lives on# stream info, not consumer info — fetch it separately.nats consumer info ORDERS my-consumer --json | jq '{ num_pending: .num_pending, num_ack_pending: .num_ack_pending, num_waiting: .num_waiting, num_redelivered: .num_redelivered, last_delivered: .delivered.stream_seq}'nats stream info ORDERS --json | jq '{stream_last_seq: .state.last_seq}'# Sample num_pending twice, 30 seconds apartP1=$(nats consumer info ORDERS my-consumer --json | jq '.num_pending')sleep 30P2=$(nats consumer info ORDERS my-consumer --json | jq '.num_pending')RATE=$(( (P2 - P1) * 2 )) # messages per minute growth rateecho "Pending: $P1 → $P2 (growth rate: $RATE msg/min)"If the growth rate is positive, the consumer is falling further behind. If it’s zero, the consumer is keeping up but has a historical backlog. If it’s negative, the consumer is catching up.
# Compare delivered sequence over timeD1=$(nats consumer info ORDERS my-consumer --json | jq '.delivered.stream_seq')sleep 10D2=$(nats consumer info ORDERS my-consumer --json | jq '.delivered.stream_seq')echo "Delivered: $D1 → $D2 (throughput: $(( (D2 - D1) * 6 )) msg/min)"If the delivered sequence isn’t advancing, the consumer is stalled — not just slow. Check for max_ack_pending saturation, disconnected instances, or a paused consumer.
# Compare the consumer's ack floor against the stream's first sequencenats consumer info ORDERS my-consumer --json | jq '{ ack_floor_stream_seq: .ack_floor.stream_seq}'nats stream info ORDERS --json | jq '{ first_seq: .state.first_seq}'If first_seq is advancing toward (or past) the consumer’s ack_floor_stream_seq, retention is evicting messages the consumer hasn’t processed — data loss is occurring.
1import (2 "fmt"3 "github.com/nats-io/nats.go"4)5
6func checkUnprocessedCritical(js nats.JetStreamContext, streamName string, threshold uint64) error {7 for consumer := range js.ConsumerNames(streamName) {8 info, err := js.ConsumerInfo(streamName, consumer)9 if err != nil {10 continue11 }12 if info.NumPending > threshold {13 fmt.Printf("CRITICAL: stream=%s consumer=%s pending=%d threshold=%d ack_pending=%d\n",14 streamName, consumer, info.NumPending, threshold, info.NumAckPending)15 }16 }17 return nil18}1import asyncio2import nats3
4async def check_unprocessed_critical(stream_name: str, threshold: int):5 nc = await nats.connect()6 js = nc.jetstream()7
8 async for consumer_name in js.consumer_names(stream_name):9 info = await js.consumer_info(stream_name, consumer_name)10 if info.num_pending > threshold:11 print(f"CRITICAL: stream={stream_name} consumer={consumer_name} "12 f"pending={info.num_pending} threshold={threshold} "13 f"ack_pending={info.num_ack_pending}")14
15 await nc.close()16
17asyncio.run(check_unprocessed_critical("ORDERS", 100000))Add more consumer instances to increase aggregate processing throughput:
# Kuberneteskubectl scale deployment order-consumer --replicas=10
# Or start additional instances manuallynats consumer next ORDERS my-consumer --count 100For pull consumers, multiple clients can pull from the same consumer simultaneously. Each pull request gets a unique batch of messages, so scaling is linear.
If num_ack_pending is at max_ack_pending, the consumer is throttled. Fix the ack bottleneck first (see CONSUMER_006), then num_pending will start decreasing:
# Check if the consumer is at max_ack_pendingnats consumer info ORDERS my-consumer --json | jq '{ num_ack_pending: .num_ack_pending, max_ack_pending: .config.max_ack_pending}'
# Increase ack_wait if messages are timing outnats consumer edit ORDERS my-consumer --wait 120sBatch processing. Instead of processing one message at a time, fetch and process batches:
1sub, _ := js.PullSubscribe("orders.>", "my-consumer")2for {3 msgs, err := sub.Fetch(500, nats.MaxWait(5*time.Second))4 if err != nil {5 continue6 }7 // Process batch — e.g., bulk database insert8 batch := make([]Order, 0, len(msgs))9 for _, msg := range msgs {10 var order Order11 json.Unmarshal(msg.Data, &order)12 batch = append(batch, order)13 }14 db.BulkInsert(batch)15 for _, msg := range msgs {16 msg.Ack()17 }18}1async def batch_processor(js):2 sub = await js.pull_subscribe("orders.>", "my-consumer")3 while True:4 msgs = await sub.fetch(500, timeout=5)5 # Bulk process6 records = [json.loads(msg.data) for msg in msgs]7 await db.bulk_insert(records)8 for msg in msgs:9 await msg.ack()Parallelize within a single instance. Use concurrent workers to process messages from the same pull subscription:
1sub, _ := js.PullSubscribe("orders.>", "my-consumer")2sem := make(chan struct{}, 20) // 20 concurrent workers3
4for {5 msgs, _ := sub.Fetch(20, nats.MaxWait(5*time.Second))6 for _, msg := range msgs {7 sem <- struct{}{}8 go func(m *nats.Msg) {9 defer func() { <-sem }()10 processMessage(m)11 m.Ack()12 }(msg)13 }14}If the stream has retention limits that are evicting unprocessed messages, temporarily relax the limits while the consumer catches up:
# Increase max_msgs temporarilynats stream edit ORDERS --max-msgs 50000000
# Or increase max_agenats stream edit ORDERS --max-age 72hRestore the original limits after the consumer has caught up.
Synadia Insights evaluates CONSUMER_008 against your configured io.nats.monitor.unprocessed-critical threshold, alerting when the backlog crosses the critical level. This catches growing backlogs early — before retention eviction causes data loss and before downstream staleness becomes a user-visible problem.
Base it on your acceptable processing latency. If messages must be processed within 5 minutes and your publish rate is 1,000 msg/s, the threshold should be no more than 300,000 (5 minutes × 1,000 msg/s). For less time-sensitive workloads, you can tolerate higher backlogs. Set the threshold via the io.nats.monitor.unprocessed-critical metadata key.
Some workloads are designed for batch processing — accumulate messages during the day, process them overnight. In these cases, a high num_pending during accumulation is expected. Set the threshold to trigger only when the backlog exceeds what the batch window can process, or disable the check during accumulation hours.
No. num_pending counts messages that haven’t been delivered to the consumer at all. Messages currently in-flight (delivered but unacknowledged) are counted in num_ack_pending. A message being redelivered was already counted in num_ack_pending, not num_pending. The two metrics are complementary: total unprocessed work = num_pending + num_ack_pending.
Yes. Reset the consumer’s deliver policy to new:
nats consumer edit ORDERS my-consumer --deliver-policy newThis abandons all unprocessed messages and starts fresh from the next published message. Only do this if the backlogged messages are no longer valuable. There’s no undo — skipped messages won’t be delivered to this consumer.
CONSUMER_006 alerts on messages that were delivered but not acknowledged — they’re being processed (or failing). CONSUMER_008 alerts on messages that haven’t been delivered at all — they’re waiting in the queue. A healthy consumer has low values for both. High num_pending with low num_ack_pending means the consumer isn’t pulling fast enough. High num_ack_pending with high num_pending means the consumer is both slow to process and falling behind.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community