A stream-consumer leader co-location alert fires when the server hosting a stream’s Raft leader also hosts more than half of that stream’s consumer leaders. This concentrates both the stream write path (appending messages) and the consumer delivery path (tracking acknowledgments, managing redelivery) on a single server, creating an I/O and CPU hotspot that limits throughput and reduces resilience.
In a NATS JetStream cluster, the stream leader handles all incoming publishes for the stream — it appends messages to storage and replicates them to followers. Each consumer leader independently tracks delivery state, processes acknowledgments, and manages redelivery timers. Both are I/O-intensive operations.
A single server carries disproportionate load. When the stream leader and most consumer leaders share the same server, that server handles: all message writes, all replication coordination, most consumer ack processing, and most redelivery scheduling. The other servers in the cluster sit comparatively idle while one server is saturated.
Throughput hits a ceiling. The bottleneck becomes the single server’s disk I/O, CPU, and network bandwidth. Adding more consumers doesn’t improve aggregate throughput if they all land on the same server. The cluster has horizontal capacity that isn’t being used.
A single server failure has outsized impact. If the co-located server goes down, the cluster loses the stream leader and most consumer leaders simultaneously. While Raft will elect new leaders, the recovery involves multiple leadership transitions happening at once, which can cause a brief but noticeable processing pause.
Consumer latency increases under load. Consumer leaders compete with the stream leader for the same server’s resources. During high-publish-rate periods, the stream leader consumes more disk I/O and CPU, leaving less for consumer ack processing. Consumers experience higher acknowledgment latency, which can trigger redelivery timeouts and duplicate processing.
Default Raft leader election behavior. Raft doesn’t consider workload distribution when electing leaders. If a server happens to win the stream leader election, its consumer Raft groups may also elect it as leader due to having the most up-to-date log. This creates accidental co-location without any explicit misconfiguration.
Server with the fastest disk or lowest latency. If one server has measurably faster storage or lower network latency to peers, Raft elections naturally favor it. It wins more elections across multiple Raft groups, concentrating leadership.
All consumers created around the same time. When consumers are created in a batch (e.g., during deployment), they all go through initial leader election simultaneously. The server that’s most responsive at that moment tends to win all the elections.
No leader distribution policy. Without explicit leader balancing (via nats server cluster step-down or automated rebalancing), leadership naturally drifts toward whichever server is most consistently available and responsive — which is often the stream leader’s server.
Low replica count (R1). With R1 streams and consumers, there’s only one copy — the leader. Leadership distribution isn’t possible because there are no followers to promote. This check primarily applies to R3 or R5 configurations.
# Show stream and consumer leader placementnats stream report
# Detailed view of a specific stream's consumersnats consumer list ORDERSLook at which server hosts the stream leader, then check which servers host each consumer’s leader. If the same server name appears for the stream leader and the majority of consumer leaders, you have co-location.
# Show all Raft group leaders across the clusternats server report jetstreamThis shows how many stream and consumer leaders each server is hosting. A server hosting significantly more leaders than its peers is likely a co-location hotspot.
1package main2
3import (4 "context"5 "fmt"6 "log"7
8 "github.com/nats-io/nats.go"9 "github.com/nats-io/nats.go/jetstream"10)11
12func main() {13 nc, _ := nats.Connect(nats.DefaultURL)14 js, _ := jetstream.New(nc)15
16 stream, err := js.Stream(context.Background(), "ORDERS")17 if err != nil {18 log.Fatal(err)19 }20 si, _ := stream.Info(context.Background())21 streamLeader := si.Cluster.Leader22
23 // Count consumer leaders per server24 serverCounts := make(map[string]int)25 consLister := stream.ListConsumers(context.Background())26 total := 027 for ci := range consLister.Info() {28 if ci.Cluster != nil && ci.Cluster.Leader != "" {29 serverCounts[ci.Cluster.Leader]++30 total++31 }32 }33
34 fmt.Printf("Stream ORDERS leader: %s\n", streamLeader)35 fmt.Printf("Consumer leaders:\n")36 for server, count := range serverCounts {37 colocated := ""38 if server == streamLeader {39 colocated = " ← STREAM LEADER"40 }41 fmt.Printf(" %s: %d/%d (%.0f%%)%s\n",42 server, count, total,43 float64(count)/float64(total)*100, colocated)44 }45}Step down consumer leaders from the co-located server. The nats consumer cluster step-down command forces the current consumer leader to abdicate, triggering a new Raft election that will typically select a different server:
# Step down a specific consumer's leadernats consumer cluster step-down ORDERS my-consumer
# Step down all consumers on the streamfor consumer in $(nats consumer list ORDERS -n); do nats consumer cluster step-down ORDERS "$consumer" sleep 1 # avoid overwhelming the cluster with electionsdoneAfter stepping down, verify the new distribution:
nats consumer list ORDERSIf consumer leaders keep returning to the same server because it’s the stream leader (and thus has the most up-to-date log), step down the stream leader first:
# Step down the stream leadernats stream cluster step-down ORDERSThis forces a new stream leader election. The new stream leader may be a different server, which changes the dynamics of subsequent consumer leader elections.
For consumers that are frequently recreated (e.g., during deployments), you can influence initial leader placement by ensuring the consumer Raft group’s peers are spread across servers. While you can’t directly set a consumer’s leader, you can step down immediately after creation:
# Create consumer, then immediately rebalance if needednats consumer add ORDERS new-consumer --pull --filter "orders.>"nats consumer cluster step-down ORDERS new-consumerRun periodic rebalancing. Schedule a job that checks leader distribution and steps down leaders on overloaded servers:
# Check if a server hosts >50% of a stream's consumer leaders# and step down excess leadersnats server report jetstream --json | \ jq -r '.servers[] | select(.leader_count > .expected_leaders) | .name'Monitor with Synadia Insights. Insights automatically detects stream-consumer leader co-location across your entire deployment and flags streams where the imbalance exceeds the threshold. This saves you from building and maintaining custom monitoring scripts.
Consider server tags for workload isolation. If certain servers are consistently better suited for stream leadership (e.g., faster disks) and others for consumer leadership, use JetStream placement tags to guide the distribution:
1server_tags: ["role:consumer-heavy"]Yes. In a 3-server cluster with 3 consumers, each server ideally hosts 1 consumer leader. But Raft elections don’t guarantee even distribution. With only 3 consumers, having 2 on one server (67%) exceeds the 50% threshold. The check is most actionable for streams with many consumers (10+), where redistribution has a meaningful impact on load balancing.
No. A consumer leader step-down triggers a Raft leadership election among the consumer’s replicas. The new leader picks up from the exact same delivery state — pending messages, ack tracking, redelivery timers. Clients connected to the consumer experience a brief pause (typically < 1 second) during the election but no messages are lost or redelivered.
After any cluster topology change (server added, removed, restarted) and periodically (weekly or monthly) during normal operations. Rebalancing is cheap — each step-down takes milliseconds — so there’s little risk in doing it frequently. The main cost is the brief leadership transition period per consumer.
No. R1 streams and consumers have only one replica — the leader. There are no followers to promote, so redistribution isn’t possible. If you need better load distribution, consider upgrading to R3 replication, which gives you replicas across 3 servers and the ability to move leaders.
In some cases, co-locating stream and consumer leaders on the same server reduces network hops for consumer reads. If the stream leader has the data in its page cache, a co-located consumer leader reads locally instead of over the network. This trade-off makes sense for low-consumer-count streams where the co-location benefit outweighs the hotspot risk. In that case, you can safely ignore this check for those specific streams.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community