A subscription fanout anomaly occurs when max fanout is disproportionately higher than average fanout on a NATS server. A max-to-average fanout ratio exceeding 10x (the default threshold) indicates one or more subjects with excessive subscribers acting as broadcast hotspots, multiplying CPU and memory cost per published message.
NATS delivers messages by iterating over every matching subscriber for a given subject. If a subject has 500 subscribers and the server average is 5, publishing a single message to that subject costs 100x more CPU than a typical publish — the server must serialize the message into 500 outbound buffers, one per subscriber connection. This cost is paid on every publish to that subject, making it a sustained CPU multiplier that scales linearly with publish rate.
The memory impact compounds the CPU cost. Each subscriber maintains a pending outbound buffer. For a subject with 500 subscribers receiving 1,000 msg/s at 1KB per message, the server allocates up to 500 pending buffers that collectively consume memory proportional to subscriber count × message rate × message size. Under load, this is precisely the pattern that triggers slow consumer disconnections (SERVER_004) — some of those 500 subscribers inevitably fall behind, and the server spends additional resources buffering messages for clients that are about to be disconnected anyway.
The anomaly is often invisible during normal development and testing. A subject with 5 subscribers in staging behaves identically to one with 500 in production — the only difference is the per-message cost multiplier. Teams discover the problem when CPU spikes during traffic peaks, when slow consumer events appear on specific servers, or when one server in a cluster uses significantly more CPU than its peers (because clients subscribing to the hot subject happen to be concentrated on that server).
Wildcard subscriptions matching too broadly. A subscriber on events.> receives every message published to any subject starting with events.. If 200 microservices each subscribe to events.> for their own logging or auditing, every publish to any events.* subject fans out to all 200. The intent is usually “each service gets its own events” — the implementation delivers all events to every service.
Missing queue groups for work distribution. When multiple instances of the same service all subscribe to the same subject without using a queue group, every instance receives every message. Three replicas of an order processor each subscribed to orders.new creates 3x fanout. With a queue group, NATS delivers each message to exactly one instance — the intended behavior for work distribution.
Monitoring or audit subscriptions duplicated per instance. A sidecar or monitoring agent subscribing to > (all subjects) on every pod in a Kubernetes deployment creates fanout proportional to pod count. A 100-pod deployment generates 100x fanout on every subject, even though each monitor only needs to sample traffic.
Shared notification subjects without partitioning. A pattern like notifications.user.> where every connected user’s client subscribes creates fanout proportional to the user base. If 10,000 users are online and a system-wide notification publishes to notifications.user.*, the server fans out to all 10,000 connections.
Cached subscription state after service restarts. If clients reconnect without cleaning up old subscriptions (or if the server retains subscription interest from routes), phantom fanout can accumulate. The effective subscriber count grows with each reconnection cycle.
Query the server’s subscription routing information:
curl -s http://localhost:8222/subsz?subs=1 | jq '{num_subscriptions: .num_subscriptions, num_cache: .num_cache, num_inserts: .num_inserts, num_matches: .num_matches, cache_hit_rate: .cache_hit_rate}'List detailed subscription information to find subjects with anomalous subscriber counts:
curl -s http://localhost:8222/subsz?subs=1 | jq '.subscriptions_list | sort_by(-.num) | .[0:10]'This returns the top 10 subjects by subscriber count. Compare the highest count to the average to confirm the anomaly.
Find clients with excessive subscriptions that may be contributing to fanout:
nats server report connections --sort subsConnections with hundreds or thousands of subscriptions are likely using broad wildcard patterns.
Compare CPU usage across cluster servers to identify if the fanout is concentrated:
nats server report jetstreamIf one server has significantly higher CPU than its peers, check whether the high-fanout subject’s subscribers are concentrated on that server.
Publish a test message and observe delivery:
# In separate terminals, subscribe to see who gets the messagenats sub "events.test" --count 1
# Publish a test messagenats pub "events.test" "fanout-test"The subscriber count shown in the publish confirmation reveals the actual fanout for that subject.
Investigate subjects with high subscriber counts. A large max-to-average fanout ratio indicates one or more subjects with excessive subscribers, which can create hot spots. Identify these subjects first, then apply the appropriate fix.
Add queue groups to work-distribution subscribers. If multiple instances of the same service subscribe to the same subject for processing (not for broadcast), add a queue group:
1// Go — queue subscription for work distribution2// Before: every instance gets every message3// sub, _ := nc.Subscribe("orders.new", handler)4
5// After: NATS delivers each message to one instance in the group6sub, _ := nc.QueueSubscribe("orders.new", "order-processors", func(msg *nats.Msg) {7 processOrder(msg.Data)8})1# Python — queue subscription2# Before: nc.subscribe("orders.new", cb=handler)3
4# After: one delivery per message across the group5await nc.subscribe("orders.new", queue="order-processors", cb=handler)Remove duplicate monitoring subscriptions. If monitoring sidecars don’t need every message, sample instead:
# Instead of subscribing to everything# nats sub ">"
# Subscribe to a specific monitoring subjectnats sub "$SYS.SERVER.*.STATSZ"Replace broad wildcards with specific subjects. Audit subscribers using > or multi-level wildcards and narrow them to the subjects they actually need:
1// Before: receives ALL events across all services2// nc.Subscribe("events.>", handler)3
4// After: receives only order events5sub, _ := nc.Subscribe("events.orders.>", func(msg *nats.Msg) {6 handleOrderEvent(msg.Data)7})Partition broadcast subjects. If a subject genuinely needs broadcast semantics to many subscribers, partition by a key to distribute the fanout:
1// Before: one subject, 10,000 subscribers2// nc.Publish("notifications.all", data)3
4// After: partition by user region, 10 partitions × 1,000 subscribers each5region := getUserRegion(userID)6nc.Publish(fmt.Sprintf("notifications.%s", region), data)Establish fanout budgets. Define maximum expected fanout per subject tier in your naming convention. Example:
| Subject pattern | Expected fanout | Mechanism |
|---|---|---|
orders.* | 1 (queue group) | Work distribution |
events.*.broadcast | 10-50 | Known broadcast |
$SYS.> | 1-3 | Monitoring only |
Use JetStream for high-fanout data flows. Instead of core NATS pub/sub with hundreds of subscribers, publish to a JetStream stream and let each consumer group process independently. The stream absorbs the write once; consumers read at their own pace without multiplying server-side delivery cost:
1// Publish once to JetStream2js, _ := nc.JetStream()3js.Publish("events.orders.created", orderData)4
5// Each service creates its own consumer — no fanout multiplication6sub, _ := js.PullSubscribe("events.orders.created",7 "analytics-consumer",8 nats.BindStream("EVENTS"),9)1# Python — JetStream consumer per service2js = nc.jetstream()3await js.publish("events.orders.created", order_data)4
5# Each service has its own pull consumer6psub = await js.pull_subscribe(7 "events.orders.created",8 durable="analytics-consumer",9 stream="EVENTS",10)It depends on your architecture. An average fanout of 1-3 is typical for microservice deployments using queue groups. A max fanout of 10-20 is reasonable for broadcast subjects like configuration updates or health checks. The check fires when the max-to-average ratio exceeds 10x — meaning one subject has dramatically more subscribers than the rest of the system. If your average is 2 and your max is 25, the ratio is 12.5x, which triggers the check even though 25 subscribers isn’t inherently problematic.
Not directly. JetStream stream writes are handled by the Raft group, not the subscription routing engine. However, if JetStream consumers use push delivery (deliver subject), each push consumer counts as a subscriber to its deliver subject. A stream with 50 push consumers creates fanout on the deliver subjects. Pull consumers avoid this because the client initiates the fetch.
In a NATS cluster, subscription interest is propagated across routes. If 100 clients on Server A subscribe to events.> and a message is published on Server B, Server B sends one copy across the route to Server A, which then fans out locally to 100 clients. The route itself only carries one copy — fanout is always local to the server where subscribers are connected. This means fanout cost is concentrated, not distributed.
NATS does not support per-subject subscriber limits. You can limit total subscriptions per account (via account limits) or per connection, but there’s no mechanism to say “subject X allows at most N subscribers.” The architectural solutions — queue groups, subject partitioning, JetStream consumers — are more effective than artificial limits would be.
Adding queue groups changes delivery semantics: instead of every instance receiving every message, each message goes to one instance. This is correct for work distribution but breaks broadcast use cases. Before adding a queue group, verify that the subscribers are processing messages (not just observing them). For monitoring and audit subscribers that genuinely need every message, keep them as plain subscriptions but narrow their wildcard scope.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community