A mirror stream replicates data from a source stream using an internal consumer. When this check fires, the mirror reports zero lag — suggesting it’s caught up — but hasn’t received any activity for over 5 minutes while the source stream continues accepting messages. The internal mirror consumer has stalled, and the mirror is silently falling behind.
Mirror streams are used for cross-cluster replication, disaster recovery, and read offloading. Operators configure mirrors expecting the destination to stay synchronized with the source. The “zero lag” status creates a false sense of health — monitoring that only checks lag values will miss this failure entirely.
The stalled consumer means new messages published to the source are not flowing to the mirror. The divergence grows with every message the source receives. If the mirror is serving read traffic (e.g., consumers in a secondary region), those consumers read increasingly stale data without any indication that the mirror has stopped updating.
For disaster recovery scenarios, the impact is severe. If the source becomes unavailable and operators fail over to the mirror, the mirror’s data is missing everything published since the stall began. Depending on how long the stall persisted, this could be minutes, hours, or days of data loss — all while dashboards showed “zero lag.”
The subtlety of this failure mode is what makes it dangerous. A mirror with non-zero lag is obviously behind and triggers standard lag alerts. A mirror with zero lag and zero activity looks healthy to most monitoring systems. Only by comparing the mirror’s last activity timestamp with the source’s ongoing publish activity can you detect the stall.
Internal mirror consumer crashed without recovery. The NATS server creates an internal consumer to drive mirror replication. If this consumer encounters an unrecoverable error — such as a corrupt message in the source, a permission change, or an internal panic — it may stop processing without being recreated.
Network partition between mirror and source. If the connection to the source stream (especially cross-cluster via gateways or leaf nodes) drops and doesn’t recover cleanly, the mirror consumer can enter a state where it believes it’s caught up to the last known position but has lost the connection to receive new messages.
Source stream subject filter mismatch. If the mirror is configured with a subject filter and the source starts publishing to subjects outside that filter, the mirror legitimately receives no new messages — but this looks identical to a stall. The check compares activity timestamps rather than message counts to catch genuine stalls.
Leader transition on the mirror stream. After a leader election on the mirror’s Raft group, the new leader must re-establish the mirror consumer. If this re-establishment fails silently, the mirror stops receiving updates but reports the pre-election lag (zero if it was caught up).
Resource exhaustion on the mirror server. If the server hosting the mirror’s leader is under memory or CPU pressure, the internal mirror consumer may be deprioritized or blocked by other internal operations (compaction, snapshot, catchup of other streams).
nats stream info MIRROR_STREAM_NAMEIn the Mirror section, look for:
1Mirror Information:2 Stream Name: SOURCE_STREAM3 Lag: 04 Last Seen: 8m22sZero lag combined with a “Last Seen” value greater than 5 minutes while the source is actively receiving messages confirms the stall.
nats stream info SOURCE_STREAM_NAMECheck that the source stream’s last_seq is advancing:
# Run twice with a gap to see if messages are arrivingnats stream info SOURCE_STREAM_NAME --json | jq '.state.last_seq'sleep 10nats stream info SOURCE_STREAM_NAME --json | jq '.state.last_seq'If the source sequence is advancing but the mirror’s “Last Seen” keeps growing, the mirror consumer is definitely stalled.
# If the mirror is cross-cluster, check gateway connectionsnats server report gateways
# Check leaf node connections if applicablenats server report leafnodes# Look for mirror consumer errorsgrep -i "mirror\|internal consumer" /var/log/nats-server.log | tail -30Common error patterns:
1[ERR] Mirror consumer for stream MIRROR_STREAM error: ...2[WRN] Failed to recreate mirror consumer for MIRROR_STREAM1package main2
3import (4 "fmt"5 "log"6 "time"7
8 "github.com/nats-io/nats.go"9)10
11func main() {12 nc, _ := nats.Connect(nats.DefaultURL)13 js, _ := nc.JetStream()14
15 for name := range js.StreamNames() {16 info, err := js.StreamInfo(name)17 if err != nil {18 log.Printf("error: %v", err)19 continue20 }21 if info.Mirror == nil {22 continue23 }24 staleness := time.Since(info.Mirror.Active)25 if info.Mirror.Lag == 0 && staleness > 5*time.Minute {26 fmt.Printf("STALE MIRROR: %s — lag=0 but last seen %s ago\n",27 name, staleness.Round(time.Second))28 }29 }30}Perform a leader step-down on the mirror stream. This forces the Raft group to elect a new leader, which recreates the internal mirror consumer:
nats stream cluster step-down MIRROR_STREAM_NAMEAfter the step-down, verify the mirror resumes:
# Wait a few seconds for the new leader to establish the mirror consumersleep 10nats stream info MIRROR_STREAM_NAMEThe “Last Seen” should now show a recent timestamp (< 1s), and the lag may temporarily spike as the mirror catches up on missed messages.
If step-down doesn’t resolve it, the mirror consumer may need a full restart:
# Force recreation by editing the stream (no-op change triggers consumer reset)nats stream edit MIRROR_STREAM_NAME --description "trigger mirror reset"Once the mirror resumes replication, verify the message counts converge:
# Compare source and mirrorecho "Source:" && nats stream info SOURCE_STREAM --json | jq '.state.messages'echo "Mirror:" && nats stream info MIRROR_STREAM --json | jq '.state.messages'If the mirror’s message count is lower than the source’s and the gap doesn’t close (because missed messages were already purged from the source by retention policy), you may need to recreate the mirror:
nats stream delete MIRROR_STREAM_NAME -fnats stream add MIRROR_STREAM_NAME --mirror SOURCE_STREAM_NAMEAlert on mirror activity age, not just lag. Standard lag-based alerting misses this failure mode entirely. Monitor the active field from the mirror info endpoint.
Ensure cross-cluster connectivity monitoring. If mirrors operate across clusters, monitor gateway and leaf node connections independently. A gateway disconnection that isn’t detected and recovered will cause mirror stalls.
Upgrade the NATS server. Improvements to internal mirror consumer resilience — particularly around automatic recreation after failures — are included in newer server releases.
Yes. Mirror Lag Critical (JETSTREAM_017) fires when the mirror has non-zero lag exceeding a threshold — the mirror knows it’s behind and is trying to catch up. Mirror Last Seen Staleness (this check) fires when the mirror reports zero lag but has stopped receiving updates entirely. JETSTREAM_017 is a throughput problem; JETSTREAM_015 is a connectivity/consumer problem.
This check compares mirror activity against source activity. If the source stream is also idle (no new messages), the mirror’s inactivity is expected and this check does not fire. The check only triggers when the source is actively receiving messages but the mirror isn’t.
Consumers attached to the mirror continue reading whatever data the mirror already has. They won’t receive new messages until the mirror resumes replication. If consumers have read all available messages, they appear idle — waiting for messages that should be arriving but aren’t.
Messages published to the source during the stall are retained by the source according to its retention policy. When the mirror consumer restarts, it resumes from its last position and catches up on the backlog. If the source’s retention policy (e.g., max_age: 1h) has already purged messages published during the stall, those messages are permanently lost from the mirror.
Stalls are typically not caused by resource constraints on the mirror itself. They’re caused by connectivity issues or internal consumer failures. Increasing CPU or memory on the mirror server won’t prevent a network partition or a consumer crash. Focus on connectivity monitoring and server upgrades instead.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community