A subscription leak occurs when a NATS client creates new subscriptions without unsubscribing from old ones. The server’s total subscription count climbs steadily over time even though the connection count remains stable. Left unchecked, the leak degrades routing performance, inflates memory usage, and can eventually exhaust server-side subscription limits.
Every subscription on a NATS server consumes memory and CPU. The server maintains a subject interest graph — a trie-based data structure that maps subjects to subscriptions for message routing. As the subscription count grows, this trie expands, increasing the time and memory required for every publish operation to resolve matching subscribers.
In a clustered deployment, the impact multiplies. Subscription interest is propagated across routes, gateways, and leaf node connections. A leak on one server inflates the interest tables on every connected server. In a supercluster with gateways, a single leaking client can bloat subscription tables across every cluster in the deployment.
The performance degradation is gradual, which makes it insidious. At 100,000 subscriptions, routing is fast. At 1 million, you might notice slightly higher CPU on publish-heavy servers. At 10 million, publish latency becomes measurable, and server memory usage has grown by gigabytes. Because the growth is slow, it often isn’t caught until the system is already significantly degraded.
The most common pattern is a request-reply leak. A client creates a unique inbox subscription for each request (_INBOX.abc123) but fails to unsubscribe when the reply arrives (or times out). In a service handling 1,000 requests per second, that’s 86 million leaked subscriptions per day. Most NATS client libraries handle inbox cleanup automatically, but custom request-reply patterns or improper use of the subscribe API can bypass this.
Request-reply without auto-unsubscribe. The client manually subscribes to a reply subject for each request but doesn’t unsubscribe after receiving the response. The built-in Request() method handles this automatically, but hand-rolled request-reply patterns often miss it.
Dynamic subscriptions without cleanup. An application subscribes to subjects based on incoming data (e.g., subscribing to user.<id>.events for each active user) but doesn’t unsubscribe when the user disconnects or the subscription is no longer needed.
Reconnect logic creating duplicate subscriptions. On reconnect, the client re-subscribes to all subjects. If the application code also re-creates subscriptions in a reconnect handler without checking for existing ones, each reconnect doubles the subscription count.
Retry loops with subscribe. A retry loop that calls Subscribe() on each attempt without unsubscribing from the previous attempt’s subscription. Each retry adds another subscription to the same subject.
Goroutine or thread leaks combined with subscriptions. A goroutine/thread is spawned per request, each creating a subscription. If the goroutine leaks (never exits), the subscription leaks with it.
JetStream push consumers without proper lifecycle. Ephemeral push consumers create subscriptions. If the application creates new consumers without deleting old ones, the subscription count grows in step with consumer count.
Monitor the server’s subscription count over time:
# Check current subscription countnats server report connections --sort subsCompare across collection intervals. If the total subscription count is climbing while the connection count is stable, you have a leak.
For a quick historical check using the monitoring endpoint:
# Get current subscription count from varzcurl -s http://localhost:8222/varz | jq '{subscriptions: .subscriptions, connections: .connections}'Find connections with anomalously high subscription counts:
nats server report connections --sort subs --top 20A client with thousands or millions of subscriptions when you’d expect dozens or hundreds is the likely source. Note the client name, CID, and IP address.
For more detail on a specific connection:
nats server report connections --json | jq '.[] | select(.subscriptions > 10000) | {cid: .cid, name: .name, ip: .ip, subs: .subscriptions, account: .account}'Use the connection detail endpoint to see what subjects a specific client is subscribed to:
curl -s "http://localhost:8222/connz?cid=<CID>&subs=detail" | jq '.connections[0].subscriptions_list'Look for patterns: many _INBOX.* subscriptions indicate a request-reply leak. Many similarly-structured subjects (e.g., user.*.events) indicate a dynamic subscription leak.
Track the subscription delta over time:
# Simple monitoring loopwhile true; do echo "$(date): $(curl -s http://localhost:8222/varz | jq .subscriptions)" sleep 60doneA steady, consistent growth rate (e.g., +100 subscriptions/minute) strongly indicates a leak. If growth correlates with traffic patterns, it’s likely tied to request processing.
Once you’ve identified the leaking client connection, restart it. This clears all its subscriptions. But this is only a temporary fix — the leak will resume when the client starts processing again.
# Kick the connection (requires system account or operator permissions)nats server request kick <CID> <SERVER_ID>Use the built-in Request() method instead of manual subscribe/publish/unsubscribe:
1// Wrong: manual subscribe leaks if response never arrives2inbox := nats.NewInbox()3sub, _ := nc.SubscribeSync(inbox)4nc.PublishRequest("service.action", inbox, data)5msg, err := sub.NextMsg(5 * time.Second)6// If timeout, sub is never unsubscribed!7
8// Right: Request() handles unsubscribe automatically9msg, err := nc.Request("service.action", data, 5*time.Second)1# Wrong: manual subscribe2inbox = nc.new_inbox()3sub = await nc.subscribe(inbox)4await nc.publish_request("service.action", inbox, data)5msg = await sub.next_msg(timeout=5)6# sub never unsubscribed on timeout!7
8# Right: request handles cleanup9msg = await nc.request("service.action", data, timeout=5)Track subscriptions and unsubscribe when they’re no longer needed:
1type SubManager struct {2 mu sync.Mutex3 subs map[string]*nats.Subscription4}5
6func (m *SubManager) Subscribe(nc *nats.Conn, subject string, cb nats.MsgHandler) error {7 m.mu.Lock()8 defer m.mu.Unlock()9
10 // Unsubscribe existing before creating new11 if old, ok := m.subs[subject]; ok {12 old.Unsubscribe()13 }14
15 sub, err := nc.Subscribe(subject, cb)16 if err != nil {17 return err18 }19 m.subs[subject] = sub20 return nil21}22
23func (m *SubManager) Unsubscribe(subject string) {24 m.mu.Lock()25 defer m.mu.Unlock()26 if sub, ok := m.subs[subject]; ok {27 sub.Unsubscribe()28 delete(m.subs, subject)29 }30}Use the client library’s automatic re-subscription feature rather than manually re-subscribing in a reconnect handler. Most NATS client libraries automatically re-establish subscriptions on reconnect:
1// Subscriptions created via nc.Subscribe() are automatically2// re-established on reconnect. Do NOT re-subscribe in the3// reconnect handler.4nc, _ := nats.Connect(url,5 nats.ReconnectHandler(func(nc *nats.Conn) {6 log.Println("Reconnected — subscriptions auto-restored")7 // Do NOT call nc.Subscribe() here for existing subjects8 }),9)Configure per-account or per-user subscription limits to prevent a leak from growing unbounded:
1accounts {2 APP {3 users = [{user: app, password: secret}]4 max_subscriptions: 100005 }6}When the limit is hit, the leaking client receives an error on new subscribe attempts, making the bug visible immediately rather than silently degrading the system.
There’s no fixed threshold — it depends on your application’s design. A service that subscribes to 10 subjects should have roughly 10 subscriptions. If it has 10,000, something is wrong. The key signal isn’t the absolute number but the growth pattern: steady, monotonic growth without corresponding connection growth is a leak.
Yes. Each active push consumer creates subscriptions on the server. Pull consumers create a subscription for the request/reply channel. If your application creates ephemeral consumers without cleaning them up, you’ll see both consumer count growth (OPT_SYS_025) and subscription count growth.
Not directly, but it causes performance degradation that can lead to message loss. As the subject interest trie grows, publish latency increases. This can push clients into slow consumer territory (SERVER_004), which does cause message loss on core NATS. The memory impact of millions of leaked subscriptions can also affect server stability.
Yes. If you set a subscription limit and the client hits it, the client will stop creating new subscriptions but the existing leaked subscriptions remain. The limit prevents unbounded growth but doesn’t fix the root cause. Use the limit as a safety net while you fix the application code.
Yes. Synadia Insights monitors subscription count trends across collection intervals and compares them against connection count trends. When subscriptions grow monotonically without a corresponding increase in connections, the check fires. This catches leaks early, before they accumulate enough to impact server performance.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community