A leafnode subscription count high alert fires when a leafnode connection is carrying a large number of subscriptions. When a leafnode connects (or reconnects) to the hub, it sends its entire subscription list. If that list is large enough that the hub takes longer than 2 seconds to process it, the hub marks the connection as stale and drops it. The leafnode then reconnects, sends the same large subscription list, and the cycle repeats — creating a connection loop that never stabilizes.
Leafnode connections are the backbone of NATS multi-cluster and edge architectures. They bridge remote clusters, edge locations, and isolated environments back to the hub. When a leafnode connection can’t stabilize, the entire remote site loses connectivity to the rest of the NATS infrastructure.
The 2-second stale connection timeout is a hard boundary. The NATS server has an internal stale connection timeout (default 2 seconds) that kills connections that haven’t completed their initial setup in time. When a leafnode sends tens or hundreds of thousands of subscriptions during connection establishment, the hub must process each one — creating internal routing table entries, propagating interest to other cluster members, and updating its subscription cache. If this processing exceeds 2 seconds, the connection is dropped.
The failure mode is a silent loop. The leafnode reconnects automatically (as it should), sends the same subscription list, gets dropped again, and repeats. From the leafnode side, you see constant reconnection attempts. From the hub side, you see a stream of stale connection warnings. Neither side logs an obvious “your subscription count is too high” error — you have to connect the dots yourself.
All clients behind the leafnode are affected. A leafnode typically serves dozens or hundreds of local clients. When the leafnode connection to the hub is unstable, every one of those clients loses the ability to communicate with the broader NATS infrastructure. Messages to subjects on the hub side go undelivered. Request-reply patterns time out. JetStream consumers on the hub cannot receive acknowledgments from edge consumers.
The problem tends to grow over time. Applications add new subscriptions as features are developed. Each new microservice behind a leafnode adds its subscriptions to the leafnode’s aggregate count. What worked at launch with 1,000 subscriptions may break a year later at 50,000.
Wildcard subscriptions propagating excessive interest. A leafnode configured with broad subject mappings (e.g., > or events.>) propagates every unique subscription from local clients to the hub. If 200 local clients each subscribe to 100 specific subjects, the leafnode sends 20,000 subscriptions to the hub at connect time.
Many microservices behind a single leafnode. Edge or branch deployments that run dozens of microservices locally, each with multiple subscriptions, accumulate a large aggregate subscription count on the single leafnode connection to the hub.
Dynamic subscription patterns. Applications that create subscriptions dynamically — per-session reply subjects, per-request inboxes, per-entity watch subjects — inflate the subscription count rapidly. Each active session or pending request adds one or more subscriptions.
Missing explicit exports/imports. Without account-level export/import configuration, the leafnode propagates all local subscriptions upstream. Explicit exports and imports act as a filter, sending only the subscriptions the hub actually needs to know about.
JetStream consumers adding subscription overhead. Each JetStream push consumer behind a leafnode creates a deliver subject subscription that propagates to the hub. A deployment with hundreds of push consumers adds hundreds of subscriptions to the leafnode connection.
# List leafnode connections with subscription detailsnats server report connections --sort subs --type leafLook for leafnode connections with subscription counts in the tens of thousands. The exact threshold where problems occur depends on hub server performance, but counts above 20,000–50,000 are in the danger zone.
# Search hub server logs for stale connection eventsgrep -i "stale connection" /var/log/nats/nats-server.log
# Look for rapid reconnect patternsgrep -i "leafnode connection\|leaf remote" /var/log/nats/nats-server.log | tail -50A pattern of repeated connect/disconnect events from the same leafnode, spaced 2–5 seconds apart, is the telltale sign of a subscription-count-induced connection loop.
On the leafnode server, check the local subscription state:
# List all subscriptions on the leafnode servernats server report connections --sort subs
# Check total subscriptionsnats server report accounts1package main2
3import (4 "encoding/json"5 "fmt"6 "io"7 "net/http"8)9
10type Leafz struct {11 Leafs []LeafInfo `json:"leafs"`12}13
14type LeafInfo struct {15 Name string `json:"name"`16 Account string `json:"account"`17 NumSubs int `json:"subscriptions"`18 IP string `json:"ip"`19 Port int `json:"port"`20 RTT string `json:"rtt"`21 InMsgs int64 `json:"in_msgs"`22 OutMsgs int64 `json:"out_msgs"`23}24
25func main() {26 resp, err := http.Get("http://localhost:8222/leafz?subs=true")27 if err != nil {28 panic(err)29 }30 defer resp.Body.Close()31
32 body, _ := io.ReadAll(resp.Body)33 var leafz Leafz34 json.Unmarshal(body, &leafz)35
36 for _, leaf := range leafz.Leafs {37 status := "OK"38 if leaf.NumSubs > 20000 {39 status = "WARNING"40 }41 if leaf.NumSubs > 50000 {42 status = "CRITICAL"43 }44 fmt.Printf("[%s] Leaf %s (account: %s): %d subs, RTT: %s\n",45 status, leaf.Name, leaf.Account, leaf.NumSubs, leaf.RTT)46 }47}1import asyncio2import aiohttp3
4async def check_leaf_subs():5 async with aiohttp.ClientSession() as session:6 async with session.get("http://localhost:8222/leafz?subs=true") as resp:7 data = await resp.json()8
9 for leaf in data.get("leafs", []):10 num_subs = leaf.get("subscriptions", 0)11 name = leaf.get("name", "unknown")12 status = "OK"13 if num_subs > 20000:14 status = "WARNING"15 if num_subs > 50000:16 status = "CRITICAL"17 print(f"[{status}] Leaf {name}: {num_subs} subs, RTT: {leaf.get('rtt', 'N/A')}")18
19asyncio.run(check_leaf_subs())Increase the stale connection timeout on the hub. This buys time for the hub to process the large subscription list without dropping the connection. This is a server-side configuration change:
1# nats-server.conf (hub)2leafnodes {3 port: 74224 # Increase stale timeout to handle large sub lists5 # Note: This is a workaround — reduce subs long-term6}Note: The stale connection timeout is not directly configurable in all NATS server versions. If it’s not tunable in your version, focus on reducing subscription count instead.
Use explicit exports and imports. Instead of propagating all subscriptions across the leafnode, define exactly which subjects should cross the boundary:
1# Hub server config2accounts {3 EDGE {4 exports: [5 { service: "api.>" }6 { stream: "events.>" }7 ]8 imports: [9 { stream: { account: EDGE, subject: "telemetry.>" } }10 ]11 }12}This filters the subscription list to only the subjects that need to cross the leafnode boundary, potentially reducing thousands of subscriptions to dozens.
Consolidate subscriptions with wildcards. Instead of subscribing to thousands of specific subjects, use wildcard subscriptions and filter in the application:
1// Before: 10,000 specific subscriptions (bad for leafnode)2for _, customerId := range customers {3 nc.Subscribe("orders."+customerId+".created", handler)4}5
6// After: one wildcard subscription (good for leafnode)7nc.Subscribe("orders.*.created", func(msg *nats.Msg) {8 // Extract customer ID from subject and route internally9 tokens := strings.Split(msg.Subject, ".")10 customerId := tokens[1]11 routeToHandler(customerId, msg)12})Switch push consumers to pull consumers. Pull consumers don’t create deliver subject subscriptions on the leafnode. They fetch messages on demand, eliminating the subscription overhead:
# Convert a push consumer to pullnats consumer add ORDERS pull-processor \ --pull \ --filter "orders.>" \ --ack explicitSegment traffic across multiple leafnode connections. Instead of one leafnode connection carrying all subscriptions, use multiple leafnode connections with per-account isolation. Each connection carries only the subscriptions for its account:
1# Leafnode server config2leafnodes {3 remotes [4 {5 url: "nats-leaf://hub:7422"6 account: "TELEMETRY"7 }8 {9 url: "nats-leaf://hub:7422"10 account: "ORDERS"11 }12 ]13}Monitor subscription growth as part of deployment reviews. Before deploying new services behind a leafnode, estimate the additional subscription count. Add it to your deployment checklist.
Set up alerts with Synadia Insights. Insights monitors leafnode subscription counts automatically and alerts when they approach dangerous thresholds, giving you time to act before the connection loop starts.
There’s no single number — it depends on hub server CPU speed, network latency, and what else the hub is doing during connection establishment. In practice, problems typically start appearing at 20,000–50,000 subscriptions on a single leafnode connection. Some deployments hit the timeout at lower counts if the hub is already under load.
Running the hub on faster hardware helps but doesn’t solve the fundamental scaling issue. Subscription processing during connection establishment is largely single-threaded per connection. A faster CPU buys you a higher threshold but doesn’t eliminate it. Reducing subscription count is the sustainable fix.
Each queue group subscription counts as one subscription from the leafnode’s perspective, regardless of how many local clients are in the queue group. If you have 100 clients in the same queue group, the leafnode sends one subscription (with the queue group name) rather than 100. This makes queue groups an effective way to reduce leafnode subscription count.
Compression (LEAF_001) reduces the bandwidth used by message payloads but doesn’t significantly help with the subscription processing timeout. The bottleneck during connection establishment is the hub’s processing time per subscription, not the time to transmit the subscription list over the wire. Compression helps with steady-state throughput, not connection setup.
Check the leafnode server’s logs for rapid reconnection messages. You’ll see a pattern like: connect → subscribe → disconnect → reconnect, repeating every 2–5 seconds. On the hub side, you’ll see corresponding stale connection warnings. Monitoring the leafnode’s connection uptime (via /connz) will show very short connection durations.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community