A high consumer redelivery rate means a JetStream consumer is receiving the same messages repeatedly because they are not being acknowledged within the configured ack_wait window. When redeliveries exceed a significant percentage of total deliveries, it signals that the consuming application is consistently failing to process messages — either crashing, timing out, or encountering errors that prevent acknowledgment.
Every redelivered message represents wasted work. The server sends the message, the consumer (possibly) processes it partially, and then the cycle repeats. At a 10% redelivery rate on a consumer handling 100,000 messages per hour, that’s 10,000 extra deliveries competing for the same processing resources as new messages. The consumer falls further behind, redeliveries compound, and throughput degrades.
The deeper problem is what redeliveries indicate: something in the processing pipeline is broken. Messages might be hitting an unhandled error path that causes the consumer to crash before acknowledging. A downstream dependency — a database, an API, a queue — might be intermittently unavailable, causing timeouts that exceed ack_wait. Or a subset of messages might be malformed or trigger edge cases that the consumer cannot handle, creating “poison messages” that cycle through redelivery indefinitely until they hit max_deliver.
Left unchecked, high redelivery creates a feedback loop. Redelivered messages consume processing capacity that could handle new messages, pushing the consumer further behind. Ack pending counts climb toward the limit (see OPT_SYS_003). If max_deliver is not configured, poison messages cycle forever. If it is configured without an advisory handler, those messages silently disappear — acknowledged as processed when they never were.
ack_wait shorter than actual processing time. The default ack_wait is 30 seconds. If message processing involves database writes, HTTP calls, or computation that occasionally exceeds this window, the server redelivers the message while the consumer is still working on it. The consumer then processes the same message twice — once from the original delivery and once from the redelivery.
Consumer crashes during processing. If the consumer process crashes or restarts after receiving a message but before acknowledging it, every in-flight message at crash time will be redelivered after ack_wait expires. Frequent restarts (OOM kills, unhandled exceptions, deployment churn) cause sustained high redelivery rates.
Downstream dependency failures. The consumer successfully receives and parses the message but cannot complete processing because a database is unreachable, an API returns errors, or a downstream queue is full. Without explicit error handling that uses Nak or Term, the message sits unacknowledged until ack_wait expires.
Poison messages. A subset of messages triggers a bug — a nil pointer on an unexpected field, a schema violation, a payload too large for a downstream system. These messages fail on every delivery attempt, consuming redelivery budget until max_deliver is reached. Without a dead letter strategy, they either cycle indefinitely or are silently dropped.
Duplicate processing from slow Ack. The consumer processes the message successfully and sends an Ack, but the ack arrives at the server after ack_wait has already triggered redelivery. This is common when the ack is sent after a long processing chain rather than using InProgress to extend the deadline.
Use the consumer report to see redelivery counts across all consumers on a stream:
nats consumer report <stream_name>Look for the Redelivered column. Compare it to the Delivered column to calculate the redelivery percentage.
For detailed stats on a specific consumer:
nats consumer info <stream_name> <consumer_name>Key fields:
NATS emits an advisory when a message hits max_deliver:
nats event --js-advisoryWatch for io.nats.jetstream.advisory.v1.max_deliver events. These identify the specific stream, consumer, and stream sequence of messages that exhausted their delivery attempts.
If redeliveries correlate with processing latency spikes, the issue is likely ack_wait timeout. Check your consumer’s processing time distribution — if P99 latency exceeds ack_wait, you’ll see redeliveries on the slowest messages.
If redeliveries cluster around specific message sequences that repeat at regular ack_wait intervals, suspect poison messages. Cross-reference the redelivered sequence numbers with your application logs to find the failing messages.
nats consumer info <stream_name> <consumer_name> --json | jq '.config | {ack_wait, max_deliver, max_ack_pending}'Verify that ack_wait is appropriate for your processing time and that max_deliver is set to prevent infinite redelivery loops.
Set max_deliver to cap retry attempts and prevent infinite redelivery loops. Without max_deliver, poison messages cycle through redelivery forever. Configure it alongside a dead letter advisory handler:
nats consumer edit <stream_name> <consumer_name> --max-deliver=5Configure backoff for exponential retry spacing. Instead of redelivering at a fixed ack_wait interval, use backoff to space retries exponentially — this gives downstream dependencies time to recover and reduces the processing load from redeliveries.
Use InProgress to extend the ack deadline for long-running work. If processing legitimately takes longer than ack_wait (default 30s), signal the server that work is ongoing:
1// Go client (nats.go)2sub, _ := js.PullSubscribe("ORDERS.>", "order-processor")3msgs, _ := sub.Fetch(10)4for _, msg := range msgs {5 // Signal work in progress every 10 seconds6 go func(m *nats.Msg) {7 ticker := time.NewTicker(10 * time.Second)8 defer ticker.Stop()9 for range ticker.C {10 _ = m.InProgress()11 }12 }(msg)13
14 processOrder(msg)15 _ = msg.Ack()16}1# Python (nats.py)2import asyncio3import nats4from nats.js.api import ConsumerConfig5
6nc = await nats.connect()7js = nc.jetstream()8sub = await js.pull_subscribe("ORDERS.>", "order-processor")9msgs = await sub.fetch(10)10for msg in msgs:11 # Extend deadline during long processing12 task = asyncio.create_task(keep_alive(msg))13 await process_order(msg)14 task.cancel()15 await msg.ack()16
17async def keep_alive(msg):18 while True:19 await asyncio.sleep(10)20 await msg.in_progress()Terminate poison messages explicitly. If a message cannot be processed, use Term to tell the server to stop redelivering it:
1if err := processMessage(msg); err != nil {2 if isPermanentError(err) {3 _ = msg.Term() // Stop redelivering this message4 } else {5 _ = msg.NakWithDelay(5 * time.Second) // Retry after backoff6 }7}Set max_deliver and handle the advisory. Configure a maximum delivery count so poison messages don’t cycle forever. Then consume the max delivery advisory to route failed messages to a dead letter stream:
1// Create consumer with max_deliver2_, err := js.AddConsumer("ORDERS", &nats.ConsumerConfig{3 Durable: "order-processor",4 AckPolicy: nats.AckExplicitPolicy,5 AckWait: 60 * time.Second,6 MaxDeliver: 5,7 MaxAckPending: 1000,8 FilterSubject: "ORDERS.>",9})10
11// Handle dead letters via advisory12nc.Subscribe("$JS.EVENT.ADVISORY.CONSUMER.MAX_DELIVERIES.ORDERS.order-processor",13 func(msg *nats.Msg) {14 // Parse advisory, publish message details to dead letter stream15 js.Publish("DEAD_LETTERS.orders", msg.Data)16 },17)Increase ack_wait if processing legitimately takes longer. The default ack_wait is 30 seconds. If your P99 processing time is 45 seconds, increase it:
nats consumer edit <stream_name> <consumer_name> --wait=90sSet ack_wait to at least 2x your P99 processing time to account for variance. Common causes of redelivery are processing time exceeding ack_wait, application panics before acknowledging, or incorrect ack logic (acking the wrong message).
Separate message receipt from processing. Acknowledge the message once it’s durably enqueued in your internal processing pipeline, not after the entire processing chain completes. This decouples NATS delivery semantics from downstream processing reliability.
Implement idempotent processing. Since redeliveries mean a message may be processed more than once, ensure your processing logic handles duplicates safely. Use the message’s stream sequence or a domain-specific deduplication key.
Add structured error handling with Nak backoff. Instead of letting messages time out silently, use NakWithDelay with exponential backoff for transient errors:
1func handleMessage(msg *nats.Msg) {2 md, _ := msg.Metadata()3 attempt := md.NumDelivered4 err := processMessage(msg)5 if err == nil {6 msg.Ack()7 return8 }9 if isPermanent(err) {10 msg.Term()11 return12 }13 delay := time.Duration(math.Pow(2, float64(attempt))) * time.Second14 msg.NakWithDelay(delay)15}A healthy consumer should have a redelivery rate well below 1%. Occasional redeliveries during deployments or transient failures are expected, but sustained rates above 5-10% indicate a systemic problem. Synadia Insights flags consumers whose redelivered messages exceed 10% of delivered messages.
Check msg.Metadata().NumDelivered on JetStream messages — any value greater than 1 indicates a redelivery (the count is parsed from the JetStream ack reply subject; there is no Nats-Num-Delivered header). For messages that exhaust max_deliver, subscribe to $JS.EVENT.ADVISORY.CONSUMER.MAX_DELIVERIES.<stream>.<consumer> to get the stream sequence number and metadata for each failed message. Cross-reference these sequences with your application logs.
Nak and Term in NATS JetStream?Nak tells the server to redeliver the message (optionally after a delay with NakWithDelay). Use it for transient errors where retry might succeed. Term tells the server to permanently stop delivering the message — it counts as acknowledged and won’t be redelivered. Use Term for poison messages that will never process successfully. Without either, the message sits unacknowledged until ack_wait expires, then gets redelivered automatically.
Yes. Every redelivered message competes with new messages for processing capacity. If a consumer processes 1,000 msg/s and 20% are redeliveries, effective throughput for new messages drops to 800 msg/s. Meanwhile, the redelivered messages that fail again create more redeliveries in the next cycle. This feedback loop can push ack pending to the limit (OPT_SYS_003), stalling delivery entirely.
max_deliver on every consumer?Yes. Without max_deliver, a poison message will be redelivered indefinitely, consuming resources forever. Set max_deliver to a reasonable value (typically 3-10) and always handle the max delivery advisory to route failed messages somewhere observable — a dead letter stream, an alert, a log. Silent message loss from hitting max_deliver without monitoring is worse than the redelivery loop.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community