Subscription churn occurs when a NATS server processes an excessive number of subscription inserts and removes within a single collection epoch. High churn rates — above 10,000 operations per epoch by default — indicate that clients are rapidly subscribing and unsubscribing, wasting CPU on interest graph updates and propagating unnecessary subscription changes to cluster routes and gateways.
NATS maintains a trie-based interest graph that maps subjects to active subscribers. Every SUB and UNSUB operation modifies this data structure — inserting or removing entries, updating reference counts, and invalidating cached matches. The cost of a single subscription operation is negligible, but at 10,000+ operations per epoch, the aggregate CPU spent on interest graph maintenance becomes measurable and directly competes with message routing.
The cost extends beyond the local server. In clustered deployments, subscription interest is propagated to all peers over route connections. Each SUB and UNSUB generates a protocol message that every other server in the cluster must process and apply to its own interest graph. In a 5-server cluster, 10,000 subscription operations on one server generate 40,000 protocol messages across routes. Gateway connections amplify this further — subscription changes propagate across super-clusters when using interest-only gateway mode.
Subscription churn also degrades the subscription cache hit rate. NATS caches subject-to-subscriber matches to avoid re-evaluating the trie on every publish. When subscriptions change frequently, cached matches are invalidated, forcing the server to re-walk the trie for affected subjects. A healthy server maintains a cache hit rate above 95%. Sustained subscription churn can drop this below 80%, adding latency to every publish operation on subjects whose cached matches were invalidated.
Per-request subscribe/unsubscribe pattern. Applications that create a temporary subscription for each incoming request — subscribe, wait for reply, unsubscribe — generate two subscription operations per request. At 5,000 requests/second, that’s 10,000 subscription operations per second. This is the most common cause and usually indicates the application isn’t using the built-in request-reply pattern (nc.Request()), which uses a multiplexed inbox subscription internally.
Reconnect storms re-subscribing all subscriptions. When a client reconnects to a NATS server, the client library re-sends all active subscriptions. If a client has 500 subscriptions and reconnects 20 times during a network disruption, that’s 10,000 subscription inserts on the server — plus the corresponding removes when the old connections are cleaned up. A fleet-wide network blip affecting hundreds of clients amplifies this to hundreds of thousands of operations.
Dynamic topic subscriptions in response to user activity. Applications that subscribe to user-specific subjects when a user logs in and unsubscribe when they log out create churn proportional to session turnover. A web application with 1,000 concurrent users and an average session length of 10 minutes generates ~100 subscribe/unsubscribe pairs per minute. During peak hours or deployment rollouts, this rate can spike dramatically.
Ephemeral JetStream consumers creating deliver subscriptions. Each ephemeral push consumer creates a subscription on its deliver subject when it binds and removes it when it’s garbage-collected. If application code creates and destroys ephemeral consumers frequently (e.g., one per batch job), each lifecycle generates subscription churn on the underlying NATS server.
Client library auto-unsubscribe behavior. Some client libraries support auto-unsubscribe after N messages (sub.AutoUnsubscribe(n)). Each auto-unsubscribe generates an UNSUB protocol message. If many subscriptions auto-unsubscribe simultaneously — for example, a batch of one-shot request listeners all completing at once — the churn spikes.
Query the server’s subscription stats to see insert and remove counts:
curl -s http://localhost:8222/subsz | jq '{num_subscriptions: .num_subscriptions, num_inserts: .num_inserts, num_removes: .num_removes, num_cache: .num_cache, cache_hit_rate: .cache_hit_rate}'Compare num_inserts and num_removes over time. If both values are increasing at thousands per second, the server has active subscription churn.
A healthy subscription cache hit rate is above 95%:
curl -s http://localhost:8222/subsz | jq '.cache_hit_rate'A cache hit rate below 80% combined with high insert/remove counts confirms that churn is degrading routing performance.
Find connections with high subscription counts or rapid subscription activity:
nats server report connections --sort subsConnections that show fluctuating subscription counts across consecutive reports are the likely sources. Also check for connections with unusually high in_msgs relative to their subscription count — this pattern often indicates per-request subscribe/unsubscribe behavior.
Subscription churn often correlates with connection churn. If clients are reconnecting frequently, each reconnection replays all subscriptions:
nats server request connections --sort idleLook for connections with very short idle times (seconds), indicating recent reconnections.
In clustered deployments, check how much subscription traffic is flowing over routes:
curl -s http://localhost:8222/routez | jq '.routes[] | {remote_id: .remote_id, in_msgs: .in_msgs, out_msgs: .out_msgs, subscriptions: .subscriptions_list | length}'High in_msgs/out_msgs on route connections with relatively few active subscriptions suggests churn propagation.
Excessive subscription insert and remove operations can stem from two distinct causes that require different remediation:
nats server report connections --sort subs, and fix the client code (see below).Use the built-in request-reply pattern. Most NATS client libraries implement request-reply using a single multiplexed inbox subscription (_INBOX.>) that handles all reply routing internally — no per-request subscribe/unsubscribe:
1// Go — WRONG: manual per-request subscription2reply := nats.NewInbox()3sub, _ := nc.SubscribeSync(reply)4nc.PublishRequest("orders.validate", reply, orderData)5msg, _ := sub.NextMsg(5 * time.Second)6sub.Unsubscribe() // Churn!7
8// Go — RIGHT: built-in request (multiplexed inbox, no churn)9msg, err := nc.Request("orders.validate", orderData, 5*time.Second)1# Python — WRONG: manual subscription per request2reply = nc.new_inbox()3sub = await nc.subscribe(reply)4await nc.publish_request("orders.validate", reply, order_data)5msg = await sub.next_msg(timeout=5)6await sub.unsubscribe() # Churn!7
8# Python — RIGHT: built-in request9msg = await nc.request("orders.validate", order_data, timeout=5)Implement exponential backoff on reconnect. Prevent all clients from reconnecting simultaneously after a network disruption:
1// Go — configure reconnect with jitter2nc, err := nats.Connect(url,3 nats.MaxReconnects(-1), // Unlimited reconnects4 nats.ReconnectWait(2*time.Second), // Base wait5 nats.ReconnectJitter(1*time.Second, 5*time.Second), // Random jitter6 nats.CustomReconnectDelay(func(attempts int) time.Duration {7 return time.Duration(math.Min(float64(attempts)*2, 30)) * time.Second8 }),9)Consolidate subscriptions. If a client subscribes to 100 specific subjects like orders.us.east.1, orders.us.east.2, etc., consider a single wildcard subscription orders.us.east.* with client-side filtering. One subscription replacing 100 reduces reconnection churn by 99x for that client.
Use durable JetStream consumers instead of ephemeral subscriptions. Durable consumers maintain their state across client disconnections. The subscription is re-bound on reconnect, but the consumer itself doesn’t need to be recreated:
1// Go — durable pull consumer (survives client restarts)2js, _ := nc.JetStream()3
4sub, _ := js.PullSubscribe(5 "orders.>",6 "order-processor", // Durable name — persists server-side7 nats.BindStream("ORDERS"),8)9
10// On reconnect, the consumer already exists — no churn11msgs, _ := sub.Fetch(10, nats.MaxWait(5*time.Second))1// TypeScript — durable consumer2import { connect } from "nats";3
4const nc = await connect({ servers: "nats://localhost:4222" });5const js = nc.jetstream();6const jsm = await nc.jetstreamManager();7
8// Create durable consumer once9await jsm.consumers.add("ORDERS", {10 durable_name: "order-processor",11 filter_subject: "orders.>",12 ack_policy: "explicit",13});14
15// Bind to existing consumer on each connect — no subscription churn16const consumer = await js.consumers.get("ORDERS", "order-processor");17const messages = await consumer.consume();Implement connection pooling. Instead of each goroutine or thread opening its own NATS connection with its own subscriptions, share a connection pool. Fewer connections mean fewer subscription replays on reconnect and fewer total subscriptions to maintain.
Monitor subscription churn as a deployment health signal. Treat sustained churn above the threshold as a code smell — it almost always indicates a subscription lifecycle pattern that should be refactored. Add churn metrics to your CI/CD validation for load tests.
In a healthy NATS deployment, subscription operations should be dominated by initial connection setup and rare reconnection events. A server handling 1,000 persistent connections with 10 subscriptions each should see roughly 10,000 inserts at startup and near-zero ongoing churn. If the server consistently processes thousands of insert/remove operations per collection epoch (typically 30-60 seconds), something is creating and destroying subscriptions in a loop.
Yes. Each subscription change invalidates portions of the subscription routing cache. When the cache miss rate increases, the server must re-evaluate the subject trie for each publish to affected subjects, which adds microseconds to tens of microseconds per publish depending on trie complexity. At high publish rates, this adds measurable tail latency.
No, but they’re often correlated. Connection churn (CLUSTER_006) measures client connect/disconnect rate. Subscription churn measures subscribe/unsubscribe rate. A single connection with a per-request subscribe/unsubscribe pattern can generate massive subscription churn with zero connection churn. Conversely, a reconnect storm generates both connection churn and subscription churn (because reconnecting clients replay all their subscriptions).
In interest-only gateway mode, the local cluster tells remote clusters exactly which subjects have local subscribers. Subscription churn triggers interest updates across gateways — each new subscription may send an interest notification to every remote cluster, and each unsubscribe may send a no-interest notification. High churn in interest-only mode generates significant cross-cluster control traffic.
Not directly from the /subsz endpoint, which shows aggregate counts. To identify specific subjects, capture subscription protocol messages from the client connection. Enable server trace logging temporarily on a suspect connection:
# Enable trace for a specific connection (by CID) via the server signalnats server request connections --cid <cid> --traceLook for patterns of repeated SUB/UNSUB on the same subject or inbox prefix.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community