Outstanding ack critical means a JetStream consumer has more in-flight, unacknowledged messages than the operator-defined threshold. These are messages the server has delivered to a client but hasn’t received an acknowledgment for — they’re in limbo between “sent” and “confirmed processed.” When this count climbs past the threshold, it signals that consumer processing capacity cannot keep up with the delivery rate, or that something is preventing acknowledgments from being sent.
Every unacknowledged message represents work that might not complete. The server has delivered the message and is waiting for confirmation. If the ack doesn’t arrive before the ack_wait timeout, the server redelivers the message to the same or a different consumer instance. This creates a compounding problem: redelivered messages consume processing capacity that could handle new messages, which pushes more messages into the ack-pending state, which triggers more redeliveries.
At critical levels, this cycle becomes a redelivery storm. The consumer spends most of its time re-processing messages it already attempted, while new messages pile up in the num_pending backlog. The effective throughput drops to a fraction of what the consumer can handle under normal conditions. In the worst case, messages hit their max_deliver limit and are sent to the advisory subject (or dropped entirely if no advisory consumer exists), resulting in permanent message loss.
The ack-pending count also has a hard ceiling: max_ack_pending on the consumer configuration (default 1,000 for pull consumers). Once this limit is reached, the server stops delivering new messages entirely — the consumer is back-pressured. If operators aren’t monitoring this, the stream’s num_pending grows unbounded while the consumer appears to be “working” but is actually throttled by its own unacknowledged backlog.
Server-side, a high ack-pending count increases memory pressure. The server tracks every pending ack with metadata (sequence number, delivery count, timestamp, reply subject). At scale — thousands of consumers each with thousands of pending acks — this metadata overhead becomes significant.
Slow message processing. The consumer receives messages faster than it can process and acknowledge them. Database writes, HTTP calls, or complex computations in the message handler take longer than the inter-message delivery interval. The ack-pending count grows with each message that takes longer to process than the delivery rate allows.
Ack wait timeout too short. The ack_wait is configured shorter than the processing time for complex messages. Messages time out and are redelivered while the consumer is still processing the original delivery. Both the original and redelivered copies now occupy ack-pending slots.
Consumer crash or restart without ack. A consumer instance processes messages but crashes before sending acknowledgments. On restart (or when another instance picks up the messages), all previously delivered messages are still in ack-pending state. If the consumer frequently restarts, ack-pending accumulates with each cycle.
Network issues preventing ack delivery. The consumer processes messages successfully and sends acks, but network problems between the client and server cause acks to be lost or delayed. The server never receives the acks and counts the messages as outstanding.
max_ack_pending set too high relative to processing capacity. A high max_ack_pending allows the server to deliver a large batch of messages that the consumer can’t process within the ack_wait window. The consumer is overwhelmed, acks slow down, and the pending count stays elevated.
Poison messages causing processing failures. Messages that consistently fail processing (malformed data, schema mismatches, missing dependencies) are never acknowledged. They consume ack-pending slots, are redelivered, fail again, and consume more slots — crowding out healthy messages.
nats consumer info ORDERS my-consumer --json | jq '{ num_ack_pending: .num_ack_pending, num_pending: .num_pending, num_redelivered: .num_redelivered, num_waiting: .num_waiting, config_max_ack_pending: .config.max_ack_pending, config_ack_wait: .config.ack_wait}'Key indicators:
# Watch the ack-pending count in real timewatch -n 5 'nats consumer info ORDERS my-consumer --json | jq .num_ack_pending'If the count is stable and near the threshold, the consumer is consistently at capacity. If it’s spiking and recovering, the problem is intermittent (likely correlated with traffic spikes or periodic processing slowdowns).
# High redelivery count indicates acks aren't arriving in timenats consumer info ORDERS my-consumer --json | jq '{ num_redelivered: .num_redelivered, delivered_consumer_seq: .delivered.consumer_seq, redelivery_ratio: (.num_redelivered / (.delivered.consumer_seq + 1) * 100 | tostring + "%")}'A redelivery ratio above 10% suggests the ack_wait is too aggressive or processing is too slow.
1import (2 "fmt"3 "github.com/nats-io/nats.go"4)5
6func checkAckPending(js nats.JetStreamContext, streamName string, threshold int) error {7 for consumer := range js.ConsumerNames(streamName) {8 info, err := js.ConsumerInfo(streamName, consumer)9 if err != nil {10 continue11 }12 if info.NumAckPending > threshold {13 pct := float64(info.NumAckPending) / float64(info.Config.MaxAckPending) * 10014 fmt.Printf("CRITICAL: stream=%s consumer=%s ack_pending=%d max=%d (%.1f%%) redelivered=%d\n",15 streamName, consumer, info.NumAckPending,16 info.Config.MaxAckPending, pct, info.NumRedelivered)17 }18 }19 return nil20}1import asyncio2import nats3
4async def check_ack_pending(stream_name: str, threshold: int):5 nc = await nats.connect()6 js = nc.jetstream()7
8 async for consumer_name in js.consumer_names(stream_name):9 info = await js.consumer_info(stream_name, consumer_name)10 if info.num_ack_pending > threshold:11 pct = (info.num_ack_pending / info.config.max_ack_pending) * 10012 print(f"CRITICAL: stream={stream_name} consumer={consumer_name} "13 f"ack_pending={info.num_ack_pending} max={info.config.max_ack_pending} "14 f"({pct:.1f}%) redelivered={info.num_redelivered}")15
16 await nc.close()17
18asyncio.run(check_ack_pending("ORDERS", 5000))Extend the ack wait timeout. If messages are being processed but acks arrive after the timeout, increase ack_wait to give the consumer more time:
nats consumer edit ORDERS my-consumer --wait 60sSet ack_wait to at least 2-3x your p99 processing time to account for variance.
Use in-progress acknowledgments. For long-running processing, send +WPI (work in progress) acks to reset the ack timer without completing the message:
1sub, _ := js.PullSubscribe("orders.>", "my-consumer")2msgs, _ := sub.Fetch(10)3for _, msg := range msgs {4 msg.InProgress() // reset ack timer5 processMessage(msg) // long-running work6 msg.Ack()7}1async def process_messages(js):2 sub = await js.pull_subscribe("orders.>", "my-consumer")3 msgs = await sub.fetch(10)4 for msg in msgs:5 await msg.in_progress() # reset ack timer6 await process_message(msg)7 await msg.ack()Reduce max_ack_pending to limit concurrency. If the consumer is overwhelmed by too many concurrent messages, lower max_ack_pending so fewer messages are in flight at once:
nats consumer edit ORDERS my-consumer --max-pending 100Scale horizontally. Add more consumer instances in the same consumer group. Each instance handles a portion of the messages, reducing per-instance ack-pending:
# Each instance subscribes to the same durable consumer# For pull consumers, multiple clients can pull from the same consumerMove blocking work out of the message handler. Process messages asynchronously to free the message handler for the next delivery:
1work := make(chan *nats.Msg, 1000)2
3// Fast message handler — just enqueue4sub, _ := js.Subscribe("orders.>", func(msg *nats.Msg) {5 work <- msg6}, nats.Durable("my-consumer"), nats.ManualAck())7
8// Worker pool — does the heavy lifting9for i := 0; i < 10; i++ {10 go func() {11 for msg := range work {12 processMessage(msg)13 msg.Ack()14 }15 }()16}Implement dead-letter handling. Messages that consistently fail processing should be moved to a dead-letter stream instead of endlessly retried:
1sub, _ := js.Subscribe("orders.>", func(msg *nats.Msg) {2 meta, _ := msg.Metadata()3 if meta.NumDelivered > 3 {4 // Move to dead-letter stream5 js.Publish("dead-letter.orders", msg.Data)6 msg.Term() // terminate redelivery7 return8 }9 if err := processMessage(msg); err != nil {10 msg.Nak() // request redelivery11 return12 }13 msg.Ack()14}, nats.Durable("my-consumer"), nats.ManualAck())Set max_deliver to cap redelivery attempts. Prevent infinite redelivery loops by limiting how many times a message can be redelivered:
nats consumer edit ORDERS my-consumer --max-deliver 5Monitor with Insights. Synadia Insights evaluates CONSUMER_006 against your configured threshold, alerting before the ack-pending count reaches max_ack_pending and causes the server to stop delivering messages.
num_ack_pending is messages delivered but not yet acknowledged — they’re actively being processed (or waiting to be redelivered). num_pending is messages in the stream that haven’t been delivered to this consumer yet — the backlog. A healthy consumer has low num_ack_pending and decreasing num_pending. High num_ack_pending with growing num_pending means the consumer is stuck.
The threshold is set via the io.nats.monitor.outstanding-ack-critical metadata key on the stream or consumer configuration. Set it based on your expected processing capacity: if your consumer can handle 1,000 in-flight messages, set the threshold to 800 (80%) to alert before saturation. Synadia Insights reads this metadata automatically.
It controls the symptom, not the cause. Lowering max_ack_pending prevents the server from overwhelming the consumer, which stabilizes the system. But the root cause — slow processing, short ack_wait, or insufficient consumer instances — still needs to be addressed. Think of max_ack_pending as a safety valve, not a fix.
You can, but it doesn’t help — Nak causes immediate redelivery, which puts the messages right back into ack-pending. If you want to skip messages, use msg.Term() to permanently terminate their delivery (they won’t be redelivered). Use this only for messages you’re willing to lose.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community