Checks/OPT_PLACE_002

NATS Consumer Leader Not Co-located: What It Means and How to Fix It

Severity
Info
Category
Performance
Applies to
Placement
Check ID
OPT_PLACE_002
Detection threshold
Consumer leader is in a different cluster than the majority of the account's connections

A consumer leader not co-located means a JetStream consumer’s Raft leader is running in a different cluster than where the majority of the account’s client connections exist. Every message delivery from this consumer traverses the inter-cluster gateway — adding latency to every single message and consuming gateway bandwidth that better placement would eliminate.

Why this matters

The consumer leader is the server that dispatches messages to subscribing clients. When a pull consumer receives a fetch request, or a push consumer delivers the next message, the leader server sends the message directly to the client connection. If the leader is in cluster A but the subscribing clients are in cluster B, every message delivery crosses the gateway. For a consumer processing thousands of messages per second, this means thousands of gateway round trips per second — all avoidable.

The latency impact is per-message. Unlike stream placement, where the latency cost is on publish acknowledgments, consumer leader misplacement adds latency to the delivery path. For pull consumers, each fetch request must travel to the remote leader and the response must travel back. For push consumers, every delivery traverses the gateway. In request-reply patterns built on top of JetStream, the end-to-end latency doubles: the request crosses the gateway to reach the consumer, and the reply crosses back.

This inefficiency compounds with ack processing. After receiving a message, the client sends an acknowledgment back to the consumer leader. If the leader is remote, the ack also crosses the gateway. For AckExplicit consumers with tight ack_wait deadlines, the added gateway latency reduces the effective time the application has to process each message. In extreme cases, the gateway latency alone can push message processing close to the ack timeout, causing unnecessary redeliveries.

Common causes

  • Raft leader elected on a different server than expected. Consumer Raft groups elect leaders based on availability and log consistency, not client proximity. After a server restart, failover, or cluster rebalance, the consumer leader may land on a server in a different cluster than where clients connect.

  • Stream replicas span multiple clusters. If a stream has replicas across clusters (via placement or default distribution), the consumer’s Raft group peers may also span clusters. The elected leader has no preference for the cluster with the most clients — it’s chosen by Raft consensus mechanics.

  • Client connections migrated to a new cluster. Clients were moved closer to a new region or datacenter, but the consumer leaders remained in the original cluster. The consumer continues to function correctly, but every delivery now crosses the gateway.

  • Consumer created from a different cluster. The initial leader is typically elected on a peer in the cluster where the consumer was created. If an operator or CI/CD pipeline creates consumers from a management cluster rather than the cluster where clients connect, the initial leader may be in the wrong cluster.

  • Leader step-down or failover. A leader step-down (manual or automatic) triggers a new election. The new leader may be elected in a different cluster than the previous one, especially if the cluster that previously held leadership has a server under load or temporarily slow.

How to diagnose

Identify consumer leader location

Check which server and cluster hosts the consumer leader:

Terminal window
nats consumer info <stream-name> <consumer-name>

Look at the Cluster section. The Leader field shows which server holds leadership. Cross-reference the server name with your cluster topology to determine which cluster it’s in.

Compare with client connection location

Check where the account’s client connections are concentrated:

Terminal window
nats server report connections --account <account-name>

Group connections by server and cluster. If the majority of connections are in a different cluster than the consumer leader, the consumer is misplaced.

Check delivery latency

Measure the time between message availability and delivery to the consumer:

Terminal window
nats consumer info <stream-name> <consumer-name> --json | jq '{ack_floor: .ack_floor, num_pending: .num_pending, num_ack_pending: .num_ack_pending}'

High num_ack_pending relative to the processing rate may indicate that gateway latency is eating into the ack window.

List all consumers and their leaders

For a broader view, check all consumers across a stream:

Terminal window
nats consumer report <stream-name>

Compare the leader column with your cluster topology to identify all misplaced consumers.

Check gateway traffic correlation

If multiple consumers are misplaced, gateway traffic will be elevated:

Terminal window
curl -s http://localhost:8222/gatewayz | jq '.outbound_gateways'

High sent/received byte counts on specific gateway connections may correlate with consumer delivery traffic.

How to fix it

Immediate: step down the consumer leader

Force a leader election to try to relocate the leader to a better-positioned replica. The step-down command causes the current leader to resign, triggering a new Raft election which may relocate the leader to a replica in the cluster where clients connect:

Terminal window
nats consumer cluster step-down <stream-name> <consumer-name>

The new leader is elected by Raft consensus. If the consumer has replicas in the cluster where clients connect, there’s a chance the new leader will be elected there. However, Raft does not guarantee placement — you may need to step down multiple times, or the leader may land back in the same cluster.

Terminal window
# Check who became the new leader
nats consumer info <stream-name> <consumer-name> --json | jq '.cluster.leader'

Short-term: use preferred placement

Configure stream placement to ensure replicas exist in the client’s cluster. Consumer Raft groups are placed on the same servers as their stream’s Raft group. If the stream has no replicas in the cluster where clients connect, the consumer cannot have a leader there either.

Terminal window
# Ensure the stream has replicas in the target cluster
nats stream edit <stream-name> --tag cluster:us-east

After the stream has replicas in the correct cluster, step down the consumer leader to trigger re-election with a replica in the target cluster as a candidate.

For new consumers, create them from the target cluster. The initial leader election tends to favor peers in the cluster where the consumer was created:

1
// Go — create consumer with connection to the target cluster
2
nc, _ := nats.Connect("nats://us-east-server:4222")
3
js, _ := nc.JetStream()
4
5
_, err := js.AddConsumer("ORDERS", &nats.ConsumerConfig{
6
Durable: "order-processor",
7
AckPolicy: nats.AckExplicitPolicy,
8
})
9
if err != nil {
10
log.Fatal(err)
11
}
1
// TypeScript (nats.js) — create consumer on target cluster
2
import { connect, AckPolicy } from "nats";
3
4
const nc = await connect({ servers: "nats://us-east-server:4222" });
5
const jsm = await nc.jetstreamManager();
6
7
await jsm.consumers.add("ORDERS", {
8
durable_name: "order-processor",
9
ack_policy: AckPolicy.Explicit,
10
});

Long-term: align consumer and client topology

Document placement requirements per consumer. For each critical consumer, define which cluster should hold its leader and include this in your operational runbook. After failovers, verify leader placement and step down if needed.

Automate leader placement checks. Write a script or use Synadia Insights to continuously compare consumer leader locations with client connection distribution. Alert when a consumer leader is in a cluster that doesn’t hold the majority of the account’s connections.

Design stream placement around access patterns. Place streams (and by extension, their consumers) in the cluster where the consuming clients connect. Use mirrors for cross-cluster read access rather than relying on gateway routing for every message delivery.

Frequently asked questions

Why doesn’t Raft automatically place the leader near clients?

Raft’s leader election algorithm optimizes for log consistency and availability, not client proximity. The candidate with the most up-to-date log wins the election, regardless of which cluster it’s in. NATS does not currently support placement-aware leader election for consumer Raft groups. The workaround is to ensure replicas exist in the desired cluster and use step-down to trigger re-election when the leader lands in the wrong place.

Does stepping down cause message loss or redelivery?

No messages are lost. However, messages that were in-flight (delivered but not yet acknowledged) at the time of the step-down will be redelivered by the new leader. This is normal Raft behavior — the new leader doesn’t know which messages the old leader delivered but didn’t get acks for. Applications should be idempotent to handle occasional redeliveries during leader transitions.

How much latency does cross-cluster consumer delivery add?

The added latency per message equals approximately one gateway round-trip time. For same-region clusters, this is typically 1-5ms per message. For cross-region deployments, it can be 20-100ms+. For a consumer processing 10,000 messages per second across a 5ms gateway, the aggregate overhead is 50 seconds of gateway time per second — a significant portion of available gateway capacity.

Can I pin a consumer leader to a specific server?

Not directly. NATS Raft does not support leader pinning. You can influence placement by ensuring the stream’s replicas are in the desired cluster and using step-down to trigger re-elections until the leader lands on a server in the right cluster. For critical consumers, monitor leader placement and automate step-downs when the leader drifts.

Does this affect push consumers differently than pull consumers?

The impact is similar for both, but manifests differently. For push consumers, every message delivery crosses the gateway from the leader to the subscriber — the subscriber has no control over delivery timing. For pull consumers, both the fetch request and the response cross the gateway, but the client controls the batch size and fetch frequency, which can amortize some of the per-request overhead. In both cases, acknowledgments also cross the gateway back to the leader.

Proactive monitoring for NATS consumer leader not co located with Synadia Insights

With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.

Start a 14-day Insights trial
Cancel