Checks/JETSTREAM_011

NATS Stream Consumer Limit: Diagnosing and Resolving Consumer Saturation

Severity
Warning
Category
Saturation
Applies to
JetStream
Check ID
JETSTREAM_011
Detection threshold
consumer count ≥ 90% of the configured max_consumers limit

A JetStream stream’s max_consumers setting caps how many consumers can be bound to it simultaneously. When the consumer count reaches 90% or more of this limit, the stream is approaching the point where new consumer creates will fail. This check flags streams at or above that threshold.

Why this matters

When a stream hits its consumer limit, any attempt to create a new consumer returns an error: maximum consumers limit reached. This affects both durable and ephemeral consumers. The failure mode is abrupt — there’s no degradation, just a hard stop on new consumer creation.

The impact depends on how your application uses consumers. If consumers are created dynamically — per request, per user session, per deployment — hitting the limit means new instances of the application can’t start consuming. In microservice architectures where each pod creates its own consumer during initialization, a rolling deployment stalls mid-way: old pods hold consumers, new pods can’t create theirs, and the deployment hangs with a mix of old and new instances.

Ephemeral consumers are especially problematic. Applications that create ephemeral consumers for short-lived operations (previewing stream data, ad-hoc queries, health checks) consume slots that are freed only when the consumer is explicitly deleted or its inactivity threshold expires. If the application doesn’t clean up and the inactivity timeout is long, ephemeral consumers accumulate silently.

The consumer limit also interacts with consumer redelivery and failover. If a consumer is deleted and recreated during failover, the new consumer needs an available slot. At 90%+ utilization, the race between cleanup and creation introduces intermittent failures that are difficult to reproduce and diagnose.

In multi-team environments, a single team’s consumer sprawl can lock out other teams that share the same stream. Without per-team consumer budgets (which NATS doesn’t natively support), the consumer limit is a shared resource that requires coordination.

Common causes

  • Stale durable consumers that are no longer active. Services that were decommissioned, renamed, or moved to different streams leave behind durable consumers. These occupy slots indefinitely because durable consumers persist until explicitly deleted.

  • Ephemeral consumer leaks. Applications that create ephemeral consumers without setting a short inactive_threshold or without explicitly deleting them. The default inactivity threshold is generous, allowing unused ephemeral consumers to linger.

  • Per-instance consumer creation. Each application instance creates its own uniquely named durable consumer instead of sharing a consumer across instances. Twenty instances of the same service create twenty consumers when one shared consumer with multiple subscribers would suffice.

  • Dynamic consumer creation patterns. Applications that create consumers on the fly — per request, per user, per workflow — can generate a high consumer count that scales with application load rather than with the number of logical consumer groups.

  • Consumer limit set too low. The max_consumers was set conservatively during stream creation and hasn’t been updated as the number of consuming services grew.

  • Test and development consumers. Consumers created during debugging, load testing, or development that were never cleaned up.

How to diagnose

Check consumer count against the limit

Terminal window
nats stream info <stream_name>

Compare the Consumers count under State with Max Consumers in the configuration.

List all consumers on the stream

Terminal window
nats consumer list <stream_name>

This shows all durable and ephemeral consumers. Look for consumers that are clearly stale — names referencing old services, old deployment IDs, or test prefixes.

Identify inactive consumers

Terminal window
nats consumer report <stream_name>

The report shows each consumer’s last activity timestamp and pending message count. Consumers with no recent activity and zero pending messages are candidates for removal.

For a detailed view:

Terminal window
nats consumer report <stream_name> --json | jq '.[] | select(.last_delivery == null or .num_pending == 0) | .name'

Check for ephemeral consumer accumulation

Ephemeral consumers have no durable name — they show as auto-generated IDs. A high count of ephemeral consumers usually indicates a leak:

Terminal window
nats consumer list <stream_name> --json | jq '[.[] | select(.config.durable_name == null or .config.durable_name == "")] | length'

Audit consumer creation patterns

1
package main
2
3
import (
4
"context"
5
"fmt"
6
"log"
7
"time"
8
9
"github.com/nats-io/nats.go"
10
"github.com/nats-io/nats.go/jetstream"
11
)
12
13
func main() {
14
nc, err := nats.Connect("nats://localhost:4222")
15
if err != nil {
16
log.Fatal(err)
17
}
18
defer nc.Close()
19
20
js, err := jetstream.New(nc)
21
if err != nil {
22
log.Fatal(err)
23
}
24
25
ctx := context.Background()
26
stream, err := js.Stream(ctx, "ORDERS")
27
if err != nil {
28
log.Fatal(err)
29
}
30
31
info, err := stream.Info(ctx)
32
if err != nil {
33
log.Fatal(err)
34
}
35
36
limit := info.Config.MaxConsumers
37
count := info.State.Consumers
38
fmt.Printf("Consumers: %d / %d (%.1f%%)\n", count, limit,
39
float64(count)/float64(limit)*100)
40
41
lister := stream.ListConsumers(ctx)
42
staleThreshold := 24 * time.Hour
43
for ci := range lister.Info() {
44
age := time.Since(ci.Delivered.Last)
45
if ci.Delivered.Last != nil && age > staleThreshold {
46
fmt.Printf(" STALE: %-30s last_delivery=%s ago pending=%d\n",
47
ci.Name, age.Round(time.Minute), ci.NumPending)
48
}
49
}
50
}
1
import asyncio
2
from datetime import datetime, timezone, timedelta
3
import nats
4
5
async def main():
6
nc = await nats.connect("nats://localhost:4222")
7
js = nc.jetstream()
8
9
stream_info = await js.stream_info("ORDERS")
10
limit = stream_info.config.max_consumers
11
count = stream_info.state.consumers
12
print(f"Consumers: {count} / {limit} ({count/limit*100:.1f}%)")
13
14
consumers = await js.consumers_info("ORDERS")
15
stale_threshold = timedelta(hours=24)
16
now = datetime.now(timezone.utc)
17
18
for ci in consumers:
19
if ci.delivered and ci.delivered.last:
20
age = now - ci.delivered.last
21
if age > stale_threshold:
22
print(f" STALE: {ci.name:30s} "
23
f"last_delivery={age} ago "
24
f"pending={ci.num_pending}")
25
26
await nc.close()
27
28
asyncio.run(main())

How to fix it

Immediate: free up consumer slots

Delete inactive durable consumers. Remove consumers that are no longer serving active workloads:

Terminal window
nats consumer delete <stream_name> <consumer_name>

Before deleting, verify the consumer is truly unused by checking its last activity and pending count in nats consumer info <stream_name> <consumer_name>. A consumer with no deliveries in the last 7+ days and zero pending messages is safe to remove.

Clean up ephemeral consumers. Ephemeral consumers with long inactivity thresholds should be deleted if the owning application is no longer running:

Terminal window
nats consumer delete <stream_name> <ephemeral_id>

Increase max_consumers. If the consumer count reflects legitimate workload growth:

Terminal window
nats stream edit <stream_name> --max-consumers 500

Short-term: consolidate consumers

Share consumers across instances. Instead of each application instance creating its own durable consumer, use a single durable consumer name and have multiple instances subscribe to it. NATS automatically distributes messages across subscribers on the same consumer:

1
// Instead of per-instance consumers:
2
// consumer, _ := js.CreateConsumer(ctx, "ORDERS", jetstream.ConsumerConfig{
3
// Durable: fmt.Sprintf("processor-%s", instanceID),
4
// })
5
6
// Share one consumer across all instances:
7
consumer, err := js.CreateOrUpdateConsumer(ctx, "ORDERS", jetstream.ConsumerConfig{
8
Durable: "order-processors",
9
AckPolicy: jetstream.AckExplicitPolicy,
10
})

This is the single most effective change for reducing consumer count. Ten instances sharing one consumer use one slot instead of ten.

Set short inactive_threshold for ephemeral consumers. Applications that create ephemeral consumers should set a short inactivity timeout so abandoned consumers are cleaned up promptly:

Terminal window
# 5-minute inactivity timeout
nats consumer add <stream_name> --ephemeral --inactive-threshold 5m

Consolidate consumers with overlapping filter subjects. If multiple consumers subscribe to subsets of the same subject space, consolidate them into a single consumer with a broader filter:

Terminal window
# Instead of three consumers for orders.us.>, orders.eu.>, orders.ap.>
# Use one consumer for orders.>
nats consumer add <stream_name> --filter "orders.>" --durable all-orders

Long-term: prevent consumer sprawl

Implement consumer naming conventions. Establish naming conventions that make it clear which service owns each consumer and whether it’s still active. Names like {service}-{environment}-{version} make stale consumers easy to identify.

Automate stale consumer cleanup. Run a periodic job that identifies and removes consumers with no recent activity:

Terminal window
# Find consumers with no deliveries in the last 7 days
nats consumer report <stream_name> --json | \
jq -r '.[] | select((.delivered.last // "1970-01-01") < (now - 604800 | todate)) | .name' | \
while read name; do
echo "Deleting stale consumer: $name"
nats consumer delete <stream_name> "$name" -f
done

Monitor consumer utilization. Alert when consumer count approaches the limit.

Synadia Insights evaluates consumer utilization automatically at every collection interval, flagging streams before they hit the ceiling.

Document consumer ownership. Maintain a registry (even a simple markdown file in your repo) that maps consumer names to owning teams and services. This makes cleanup audits straightforward.

Frequently asked questions

What happens when a stream reaches its consumer limit?

New consumer creation fails with a maximum consumers limit reached error. Existing consumers continue to work normally — they can still receive messages, acknowledge them, and process data. Only the creation of new consumers is blocked.

Do ephemeral consumers count against the limit?

Yes. Ephemeral consumers occupy a consumer slot just like durable consumers. The difference is that ephemeral consumers are automatically deleted after their inactivity threshold expires (if no subscribers are active). Until that happens, they count against max_consumers.

If I delete a consumer, does it lose its position in the stream?

Yes. Deleting a durable consumer removes its state, including its acknowledged position. If you recreate a consumer with the same name, it starts from the stream’s deliver_policy default (usually the beginning or the latest message). To preserve position, don’t delete — instead, troubleshoot why the consumer appears stale.

Can multiple applications share the same consumer?

Yes, and this is recommended. Multiple subscribers on the same consumer receive messages in a round-robin fashion. This is functionally equivalent to a queue group for JetStream. It’s the primary mechanism for horizontal scaling without consuming additional consumer slots.

How do I set max_consumers to unlimited?

Set it to -1 or 0 during stream creation or edit:

Terminal window
nats stream edit <stream_name> --max-consumers=-1

This removes the consumer limit entirely. Use this cautiously — without a limit, consumer sprawl can accumulate unchecked and impact Raft group overhead.

Proactive monitoring for NATS stream consumer limit with Synadia Insights

With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.

Start a 14-day Insights trial
Cancel