NATS Stream Replica Message Count Divergence: What It Means and How to Fix It

In a replicated JetStream stream, all current replicas should have the same message count. When this check fires, every replica reports that it is caught up in Raft consensus — none are lagging — yet they hold different numbers of messages. This silent divergence means at least one replica’s data does not match the others, and the cluster doesn’t know it.

Why this matters

Raft consensus guarantees that committed operations are replicated to a majority. Under normal conditions, all current replicas apply the same sequence of operations and arrive at the same state. When replicas report “current” but have different message counts, the consensus layer believes everything is synchronized while the storage layer tells a different story.

This is dangerous because the divergence is invisible to normal operations. Publishers receive acknowledgments. Consumers read messages. Monitoring shows all replicas current with zero lag. The problem only becomes visible when you compare the actual message counts across replicas — or when a leader election promotes a replica with fewer messages, causing consumers to see gaps or duplicate deliveries.

For streams using interest-based retention, the divergence is particularly insidious. Interest-based retention deletes messages when all consumers have acknowledged them. If consumer acknowledgment propagation fails on one replica, that replica retains messages the others have deleted (or vice versa). The replicas drift further apart over time rather than converging.

For streams using limits-based retention with max_msgs or max_bytes, each replica independently enforces limits based on its local state. A replica with a higher message count may purge messages that a replica with a lower count still needs, creating permanent data loss if leadership transfers.

The worst-case scenario: a leader failure promotes the replica with the fewest messages. Every message present on the old leader but absent on the new leader is permanently lost. Consumers that already processed those messages see them vanish from the stream. Sequence numbers may recycle, breaking deduplication guarantees.

Common causes

Filestore corruption on one or more replicas. Disk errors, unclean shutdowns, or storage subsystem failures can corrupt individual message blocks on a specific replica without affecting Raft log consistency. The Raft layer sees all operations as applied, but the filestore failed to persist some messages.
Raft state reset or snapshot divergence. If a replica’s Raft state is reset (e.g., due to a corrupt WAL), it rebuilds from a snapshot. If the snapshot itself was taken from an already-diverged state, the replica starts from an incorrect baseline and the divergence persists.
Interest-based retention consumer ack propagation failure. In interest-based retention streams, message deletion is driven by consumer acknowledgments. If an ack is applied on the leader but fails to propagate to a follower’s consumer state, the follower retains messages the leader has deleted. Over time, this produces a measurable count divergence.
Partial restore from backup. Restoring a stream backup to one replica without coordinating with the cluster can introduce divergence. The restored replica has a message set that doesn’t match what the other replicas have through normal replication.
Server version mismatch during rolling upgrade. Different server versions may handle edge cases in message deletion, compaction, or retention differently. During a rolling upgrade window, replicas running different versions can process the same operation with different outcomes.

How to diagnose

Compare replica message counts

nats stream info STREAM_NAME --all

Look at the state section for each replica. Under normal operation, all replicas should show identical values for messages, bytes, first_seq, and last_seq. Example output showing divergence:

1
Cluster Information:
2
      Leader: server-1
3
     Replica: server-2, current, seen 0.12s ago (messages: 482,917)
4
     Replica: server-3, current, seen 0.09s ago (messages: 481,203)

If the leader has 483,000 messages and replicas show 482,917 and 481,203, you have divergence despite all replicas being “current.”

Identify which replica diverged

# Get detailed per-replica state
nats stream info STREAM_NAME --json | jq '.cluster.replicas[] | {name, current, lag, active, msgs: .state.messages}'

The replica with the lowest message count is most likely the one that lost data. However, if using interest-based retention, the replica with the highest count may be the anomaly — it failed to delete messages that others correctly purged.

Check for filestore errors in server logs

# On each server hosting a replica
grep -i "filestore\|corrupt\|recover" /var/log/nats-server.log | tail -50

Look for patterns like:

1
[ERR] JetStream filestore error: short block
2
[WRN] Stream STREAM_NAME recovered with X fewer messages

Verify Raft state consistency

nats server report jetstream --account ACCOUNT --stream STREAM_NAME

If all replicas report identical Raft applied indexes but different message counts, the divergence is in the storage layer, not the consensus layer.

Programmatic detection

1
package main
2

3
import (
4
  "fmt"
5
  "log"
6
  "math"
7

8
  "github.com/nats-io/nats.go"
9
)
10

11
func main() {
12
  nc, _ := nats.Connect(nats.DefaultURL)
13
  js, _ := nc.JetStream()
14

15
  for name := range js.StreamNames() {
16
    info, err := js.StreamInfo(name, nats.MaxWait(5*time.Second))
17
    if err != nil {
18
      log.Printf("error getting info for %s: %v", name, err)
19
      continue
20
    }
21
    if info.Cluster == nil || len(info.Cluster.Replicas) == 0 {
22
      continue
23
    }
24
    leaderMsgs := info.State.Msgs
25
    for _, r := range info.Cluster.Replicas {
26
      if r.Current && r.Lag == 0 {
27
        // Replica reports current — compare state if available
28
        fmt.Printf("Stream %s: leader=%d, check replica %s manually\n",
29
          name, leaderMsgs, r.Name)
30
      }
31
    }
32
  }
33
}

How to fix it

Immediate: identify the authoritative replica

Determine which replica has the correct data. In most cases, the leader’s state is authoritative. However, if the leader was recently elected from a previously lagging replica, a different peer may have more complete data. Check the Raft applied index and server logs to determine which replica has the highest message count with a clean recovery history.

Step the leader to the best replica. If the current leader is the diverged replica, move leadership to the replica with the most complete data:

nats stream cluster step-down STREAM_NAME --preferred HEALTHY_SERVER

Short-term: rebuild diverged replicas

Remove and replace the diverged replica. Once you’ve confirmed which replica has the correct state and ensured it’s the leader, remove each diverged peer:

# Remove the diverged replica
nats stream cluster peer-remove STREAM_NAME DIVERGED_PEER

# NATS will automatically schedule a replacement on an available server
# Monitor progress:
nats stream info STREAM_NAME

The new replica will be provisioned from the current leader’s state, eliminating the divergence.

For interest-based retention streams: also check consumers. If the divergence was caused by consumer ack propagation failures, the consumer state on the rebuilt replica may also need to be recreated:

# List consumers and check for state inconsistencies
nats consumer info STREAM_NAME CONSUMER_NAME --all

Long-term: prevent recurrence

Enable consistent stream checksums. Upgrade to the latest NATS server version, which includes improved filestore checksum validation during compaction and recovery.

Monitor for divergence continuously. Add per-replica message count comparison to your monitoring. Synadia Insights evaluates this automatically, but for custom monitoring.

Avoid SIGKILL in production. Always use SIGTERM for server shutdowns to allow clean filestore flushes. In Kubernetes, ensure terminationGracePeriodSeconds is sufficient for the server to complete pending writes.

Keep server versions consistent. During rolling upgrades, minimize the window where replicas run different server versions. Upgrade followers first, then step down the leader and upgrade it last.

Frequently asked questions

How is this different from JETSTREAM_001 (Stream Replica Lag)?

Replica lag (JETSTREAM_001) means a follower is behind in applying Raft operations — it knows it’s behind and is catching up. Replica message count divergence (this check) means replicas believe they’re synchronized but their actual data doesn’t match. Lag is a transient operational condition; divergence is a corruption indicator.

Can this happen with R1 streams?

No. R1 streams have a single copy — there’s no other replica to diverge from. This check only applies to replicated streams (R3, R5).

Will normal message publishing fix the divergence over time?

No. New messages are replicated correctly through Raft, but the existing divergence persists. If replica A has 1,000 messages and replica B has 990, after 100 new messages they’ll have 1,100 and 1,090. The gap doesn’t close because the missing messages are from the past and won’t be re-replicated.

What if I can’t determine which replica is correct?

Compare message counts with any external source of truth — application logs, upstream publish counters, or consumer delivery records. If no external reference exists, treat the replica with the highest message count as most likely correct (it retained data others lost). Back up from that replica before rebuilding.

Does this check account for in-flight replication?

Yes. This check only fires when all replicas report “current” with zero lag. If any replica reports lag, the difference in message counts is expected and is covered by JETSTREAM_001 instead. The simultaneous “current + different counts” condition is what makes this check a corruption signal rather than a normal replication delay.

FEATURED

RESOURCES

Comparisons