Checks/OPT_SYS_022

NATS Subscription Count Growth: Detecting Subscription Leaks

Severity
Info
Category
Errors
Applies to
Server
Check ID
OPT_SYS_022
Detection threshold
Subscription count growing monotonically without corresponding connection growth

A subscription leak occurs when a NATS client creates new subscriptions without unsubscribing from old ones. The server’s total subscription count climbs steadily over time even though the connection count remains stable. Left unchecked, the leak degrades routing performance, inflates memory usage, and can eventually exhaust server-side subscription limits.

Why this matters

Every subscription on a NATS server consumes memory and CPU. The server maintains a subject interest graph — a trie-based data structure that maps subjects to subscriptions for message routing. As the subscription count grows, this trie expands, increasing the time and memory required for every publish operation to resolve matching subscribers.

In a clustered deployment, the impact multiplies. Subscription interest is propagated across routes, gateways, and leaf node connections. A leak on one server inflates the interest tables on every connected server. In a supercluster with gateways, a single leaking client can bloat subscription tables across every cluster in the deployment.

The performance degradation is gradual, which makes it insidious. At 100,000 subscriptions, routing is fast. At 1 million, you might notice slightly higher CPU on publish-heavy servers. At 10 million, publish latency becomes measurable, and server memory usage has grown by gigabytes. Because the growth is slow, it often isn’t caught until the system is already significantly degraded.

The most common pattern is a request-reply leak. A client creates a unique inbox subscription for each request (_INBOX.abc123) but fails to unsubscribe when the reply arrives (or times out). In a service handling 1,000 requests per second, that’s 86 million leaked subscriptions per day. Most NATS client libraries handle inbox cleanup automatically, but custom request-reply patterns or improper use of the subscribe API can bypass this.

Common causes

  • Request-reply without auto-unsubscribe. The client manually subscribes to a reply subject for each request but doesn’t unsubscribe after receiving the response. The built-in Request() method handles this automatically, but hand-rolled request-reply patterns often miss it.

  • Dynamic subscriptions without cleanup. An application subscribes to subjects based on incoming data (e.g., subscribing to user.<id>.events for each active user) but doesn’t unsubscribe when the user disconnects or the subscription is no longer needed.

  • Reconnect logic creating duplicate subscriptions. On reconnect, the client re-subscribes to all subjects. If the application code also re-creates subscriptions in a reconnect handler without checking for existing ones, each reconnect doubles the subscription count.

  • Retry loops with subscribe. A retry loop that calls Subscribe() on each attempt without unsubscribing from the previous attempt’s subscription. Each retry adds another subscription to the same subject.

  • Goroutine or thread leaks combined with subscriptions. A goroutine/thread is spawned per request, each creating a subscription. If the goroutine leaks (never exits), the subscription leaks with it.

  • JetStream push consumers without proper lifecycle. Ephemeral push consumers create subscriptions. If the application creates new consumers without deleting old ones, the subscription count grows in step with consumer count.

How to diagnose

Confirm subscription growth

Monitor the server’s subscription count over time:

Terminal window
# Check current subscription count
nats server report connections --sort subs

Compare across collection intervals. If the total subscription count is climbing while the connection count is stable, you have a leak.

For a quick historical check using the monitoring endpoint:

Terminal window
# Get current subscription count from varz
curl -s http://localhost:8222/varz | jq '{subscriptions: .subscriptions, connections: .connections}'

Identify the leaking client

Find connections with anomalously high subscription counts:

Terminal window
nats server report connections --sort subs --top 20

A client with thousands or millions of subscriptions when you’d expect dozens or hundreds is the likely source. Note the client name, CID, and IP address.

For more detail on a specific connection:

Terminal window
nats server report connections --json | jq '.[] | select(.subscriptions > 10000) | {cid: .cid, name: .name, ip: .ip, subs: .subscriptions, account: .account}'

Examine subscription subjects

Use the connection detail endpoint to see what subjects a specific client is subscribed to:

Terminal window
curl -s "http://localhost:8222/connz?cid=<CID>&subs=detail" | jq '.connections[0].subscriptions_list'

Look for patterns: many _INBOX.* subscriptions indicate a request-reply leak. Many similarly-structured subjects (e.g., user.*.events) indicate a dynamic subscription leak.

Monitor growth rate

Track the subscription delta over time:

Terminal window
# Simple monitoring loop
while true; do
echo "$(date): $(curl -s http://localhost:8222/varz | jq .subscriptions)"
sleep 60
done

A steady, consistent growth rate (e.g., +100 subscriptions/minute) strongly indicates a leak. If growth correlates with traffic patterns, it’s likely tied to request processing.

How to fix it

Immediate: identify and restart the leaking client

Once you’ve identified the leaking client connection, restart it. This clears all its subscriptions. But this is only a temporary fix — the leak will resume when the client starts processing again.

Terminal window
# Kick the connection (requires system account or operator permissions)
nats server request kick <CID> <SERVER_ID>

Fix request-reply leaks

Use the built-in Request() method instead of manual subscribe/publish/unsubscribe:

1
// Wrong: manual subscribe leaks if response never arrives
2
inbox := nats.NewInbox()
3
sub, _ := nc.SubscribeSync(inbox)
4
nc.PublishRequest("service.action", inbox, data)
5
msg, err := sub.NextMsg(5 * time.Second)
6
// If timeout, sub is never unsubscribed!
7
8
// Right: Request() handles unsubscribe automatically
9
msg, err := nc.Request("service.action", data, 5*time.Second)
1
# Wrong: manual subscribe
2
inbox = nc.new_inbox()
3
sub = await nc.subscribe(inbox)
4
await nc.publish_request("service.action", inbox, data)
5
msg = await sub.next_msg(timeout=5)
6
# sub never unsubscribed on timeout!
7
8
# Right: request handles cleanup
9
msg = await nc.request("service.action", data, timeout=5)

Fix dynamic subscription leaks

Track subscriptions and unsubscribe when they’re no longer needed:

1
type SubManager struct {
2
mu sync.Mutex
3
subs map[string]*nats.Subscription
4
}
5
6
func (m *SubManager) Subscribe(nc *nats.Conn, subject string, cb nats.MsgHandler) error {
7
m.mu.Lock()
8
defer m.mu.Unlock()
9
10
// Unsubscribe existing before creating new
11
if old, ok := m.subs[subject]; ok {
12
old.Unsubscribe()
13
}
14
15
sub, err := nc.Subscribe(subject, cb)
16
if err != nil {
17
return err
18
}
19
m.subs[subject] = sub
20
return nil
21
}
22
23
func (m *SubManager) Unsubscribe(subject string) {
24
m.mu.Lock()
25
defer m.mu.Unlock()
26
if sub, ok := m.subs[subject]; ok {
27
sub.Unsubscribe()
28
delete(m.subs, subject)
29
}
30
}

Use the client library’s automatic re-subscription feature rather than manually re-subscribing in a reconnect handler. Most NATS client libraries automatically re-establish subscriptions on reconnect:

1
// Subscriptions created via nc.Subscribe() are automatically
2
// re-established on reconnect. Do NOT re-subscribe in the
3
// reconnect handler.
4
nc, _ := nats.Connect(url,
5
nats.ReconnectHandler(func(nc *nats.Conn) {
6
log.Println("Reconnected — subscriptions auto-restored")
7
// Do NOT call nc.Subscribe() here for existing subjects
8
}),
9
)

Set subscription limits as a safety net

Configure per-account or per-user subscription limits to prevent a leak from growing unbounded:

nats-server.conf
1
accounts {
2
APP {
3
users = [{user: app, password: secret}]
4
max_subscriptions: 10000
5
}
6
}

When the limit is hit, the leaking client receives an error on new subscribe attempts, making the bug visible immediately rather than silently degrading the system.

Frequently asked questions

How many subscriptions is too many for a single connection?

There’s no fixed threshold — it depends on your application’s design. A service that subscribes to 10 subjects should have roughly 10 subscriptions. If it has 10,000, something is wrong. The key signal isn’t the absolute number but the growth pattern: steady, monotonic growth without corresponding connection growth is a leak.

Do JetStream consumers count as subscriptions?

Yes. Each active push consumer creates subscriptions on the server. Pull consumers create a subscription for the request/reply channel. If your application creates ephemeral consumers without cleaning them up, you’ll see both consumer count growth (OPT_SYS_025) and subscription count growth.

Will a subscription leak cause message loss?

Not directly, but it causes performance degradation that can lead to message loss. As the subject interest trie grows, publish latency increases. This can push clients into slow consumer territory (SERVER_004), which does cause message loss on core NATS. The memory impact of millions of leaked subscriptions can also affect server stability.

Can subscription limits mask the problem?

Yes. If you set a subscription limit and the client hits it, the client will stop creating new subscriptions but the existing leaked subscriptions remain. The limit prevents unbounded growth but doesn’t fix the root cause. Use the limit as a safety net while you fix the application code.

Does Insights detect the growth pattern automatically?

Yes. Synadia Insights monitors subscription count trends across collection intervals and compares them against connection count trends. When subscriptions grow monotonically without a corresponding increase in connections, the check fires. This catches leaks early, before they accumulate enough to impact server performance.

Proactive monitoring for NATS subscription count growth with Synadia Insights

With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.

Start a 14-day Insights trial
Cancel