A consumer leak happens when an application keeps creating new JetStream consumers without cleaning up old ones. The stream’s consumer count climbs steadily — 100, then 500, then 2,000 — even though the application’s logical consumer count hasn’t changed. Each leaked consumer consumes server memory, maintains its own Raft group (for R3 streams), and adds to the metadata overhead. Eventually, the leak degrades cluster performance, exhausts consumer limits, or destabilizes Raft.
Every JetStream consumer is a stateful object. The server tracks the consumer’s configuration, its ack floor (which messages have been acknowledged), its pending set (which messages are in flight), and its last activity timestamp. For replicated streams, each consumer has its own Raft group for state replication. This means every leaked consumer creates:
At tens of consumers, this is negligible. At thousands, it’s a measurable load on the cluster. The Raft heartbeat traffic alone — each consumer’s Raft group sending heartbeats every 200ms — can saturate internal routes. The meta leader, which tracks all JetStream assets, slows down as the asset count grows. Stream operations that enumerate consumers (like nats consumer list) become slow.
The most common source is ephemeral consumers. An ephemeral consumer is designed to be temporary — it has no durable name and is automatically cleaned up after inactive_threshold (default: 5 seconds of inactivity). But if the application creates new ephemeral consumers faster than old ones expire, or if inactive_threshold is set too high, the consumer count grows monotonically.
The pattern often looks like this: a service restarts and creates new ephemeral consumers. The old consumers from the previous instance haven’t expired yet (or inactive_threshold is set to hours or days). Each restart adds another batch of consumers. Over a deployment cycle with frequent restarts — rolling updates, autoscaling events, crash loops — the consumer count balloons.
Ephemeral consumers without proper inactive_threshold. The application creates ephemeral consumers with a long inactive_threshold (or relies on the default for older server versions). Old consumers linger long after the creating client disconnects.
Application restart creating new consumers. On each startup, the application calls CreateConsumer() or Subscribe() with a new or no durable name. The previous instance’s consumers are still alive, and now there are duplicates.
Auto-scaling creating consumer instances. Each new pod/instance in an auto-scaled service creates its own ephemeral consumer. Scale-down events destroy the pod but the consumer persists until inactive_threshold expires.
Missing durable names. The application intends for consumers to be persistent but omits the durable name, creating a new ephemeral consumer on each connection. Using a durable name would reuse the existing consumer instead.
Client library creating implicit consumers. Some NATS client library convenience methods (like js.Subscribe() in Go) create new consumers implicitly. If called repeatedly without understanding the consumer lifecycle, each call adds a consumer.
Error handling creating retry consumers. A retry loop that creates a new consumer on each attempt. If the error is persistent, the loop generates hundreds of consumers before the issue is resolved.
# Current consumer counts per streamnats stream list --json | jq '.[] | {name: .config.name, consumers: .state.consumer_count}' | sortRun this periodically and compare. A stream where consumer count grows monotonically has a leak.
# Find streams with unusually high consumer countsnats stream list --json | jq '.[] | select(.state.consumer_count > 50) | {name: .config.name, consumers: .state.consumer_count}'# List consumers on the affected streamnats consumer list MY_STREAM --json | jq '.[] | { name: .config.name, durable: .config.durable_name, created: .created, inactive_threshold: .config.inactive_threshold, last_activity: .ts, pending: .num_pending, ack_pending: .num_ack_pending}'Look for patterns:
durable_name → ephemeral consumer leakinactive_threshold → slow cleanupack_pending and zero num_pending → inactive/orphaned consumersMonitor the JetStream advisory subject for consumer creation events:
nats sub '$JS.EVENT.ADVISORY.CONSUMER.CREATED.MY_STREAM.>'A steady stream of creation events without corresponding deletion events confirms an active leak.
If consumer count spikes correlate with deployment events (rolling updates, scaling events), the application lifecycle is the likely source:
# Check consumer creation timesnats consumer list MY_STREAM --json | jq '[.[] | .created] | sort | .[-10:]'Delete orphaned consumers that are no longer active:
# Delete a specific consumernats consumer delete MY_STREAM CONSUMER_NAME
# Delete all consumers matching a pattern (use with caution)nats consumer list MY_STREAM --json | jq -r '.[] | select(.num_ack_pending == 0 and .num_pending == 0) | .config.name' | while read name; do nats consumer delete MY_STREAM "$name" -fdoneFor ephemeral consumers, set inactive_threshold to match the expected inactivity window — typically a few seconds to a few minutes:
1js, _ := nc.JetStream()2sub, err := js.Subscribe("orders.>",3 func(msg *nats.Msg) {4 processOrder(msg)5 msg.Ack()6 },7 nats.InactiveThreshold(30*time.Second),8)1import nats2from nats.js.api import ConsumerConfig3from datetime import timedelta4
5nc = await nats.connect()6js = nc.jetstream()7
8await js.subscribe(9 "orders.>",10 cb=process_order,11 config=ConsumerConfig(12 inactive_threshold=timedelta(seconds=30).total_seconds(),13 ),14)If the consumer represents a logical processing identity that should survive restarts, use a durable name:
1sub, err := js.PullSubscribe("orders.>", "order-processor",2 nats.BindStream("ORDERS"),3)4// On restart, this reuses the existing "order-processor" consumer5// instead of creating a new one1sub = await js.pull_subscribe(2 "orders.>",3 durable="order-processor",4 stream="ORDERS",5)Ensure consumers are cleaned up on application shutdown:
1// Create consumer2sub, _ := js.Subscribe("orders.>", handler)3
4// On shutdown, unsubscribe (destroys ephemeral consumer)5shutdown := make(chan os.Signal, 1)6signal.Notify(shutdown, syscall.SIGTERM, syscall.SIGINT)7<-shutdown8sub.Unsubscribe() // removes the ephemeral consumer1import signal2import asyncio3
4sub = await js.subscribe("orders.>", cb=handler)5
6async def shutdown():7 await sub.unsubscribe() # removes the ephemeral consumer8 await nc.close()9
10loop = asyncio.get_event_loop()11loop.add_signal_handler(signal.SIGTERM, lambda: asyncio.create_task(shutdown()))Configure a maximum consumer count on streams to prevent unbounded growth:
nats stream edit MY_STREAM --max-consumers 100When the limit is hit, new consumer creation fails with an error, making the leak immediately visible rather than silently degrading the cluster.
Set up alerting on consumer count growth.
Consumer churn (CONSUMER_002) is high create/delete rate — consumers being created and destroyed frequently. Consumer growth is the net count increasing over time — creation rate exceeds deletion rate. Churn causes metadata overhead from frequent Raft operations. Growth causes persistent resource accumulation. They can occur together: high churn with a slight imbalance toward creation produces steady growth.
There’s no hard limit, but performance degrades gradually. At 100 consumers on an R3 stream, you have 300 Raft groups just for consumers. At 1,000, you have 3,000. Each Raft group adds heartbeat traffic, memory, and WAL I/O. For most workloads, more than 50–100 consumers on a single stream warrants investigation. If you genuinely need many consumers, consider partitioning across multiple streams.
If inactive_threshold is shorter than the gap between message deliveries, the consumer may be cleaned up while still logically active. For pull consumers that fetch periodically, set inactive_threshold longer than the maximum expected interval between fetches. For push consumers, the subscription heartbeat keeps the consumer alive — inactive_threshold is only relevant after the subscription is gone.
Not in place. You need to delete the ephemeral consumer and create a new durable consumer with the same filter and deliver policy. If the ephemeral consumer has processing state (ack floor), you’ll need to set the new durable consumer’s deliver_policy to by_start_sequence matching the ephemeral consumer’s ack floor to avoid reprocessing.
Yes. Synadia Insights monitors consumer count per stream across the lookback window and flags streams where the count is growing steadily. The check distinguishes between legitimate scaling (correlated with connection growth) and leaks (growth without corresponding connection or workload changes), so it doesn’t alert on intentional consumer additions.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community