A stream’s unique subject count should never exceed its total message count. Every unique subject requires at least one message, so num_subjects > num_messages is a mathematical impossibility under correct operation. When this check fires, the stream’s internal filestore accounting is corrupted — the metadata that tracks subjects has diverged from the actual message data.
This is not a performance issue or a transient condition. It is an invariant violation that indicates the stream’s filestore metadata is inconsistent with its message data. The corruption is persistent — it survives server restarts and will not self-heal.
The practical consequences depend on the stream’s configuration. For streams using subject-based retention or subject-based limits (max_msgs_per_subject), the corrupted subject count means the server’s decisions about which messages to retain and which to discard are based on incorrect data. Messages may be prematurely purged because the server believes a subject has more messages than it actually does, or messages may be retained indefinitely because the server thinks a subject hasn’t reached its limit.
For any stream, the corrupted subject count affects the output of nats stream info and monitoring endpoints. Any tooling, alerting, or operational dashboards that rely on the subject count for cardinality tracking or capacity planning will report incorrect values. More critically, the underlying corruption that caused the subject/message divergence may indicate broader filestore issues — if the subject index is wrong, other internal indices may be unreliable as well.
In replicated streams (R3, R5), the corruption may be isolated to a single replica. If the corrupted replica is the leader, all clients see the incorrect state. If it’s a follower, a future leader election could promote the corrupted replica, propagating the inconsistency to clients.
Unclean server shutdown during filestore compaction. The NATS server periodically compacts filestore metadata. If the server is killed (SIGKILL, OOM, power loss) during compaction, partially written index files can leave the subject count inconsistent with the message data.
Filesystem corruption or disk errors. Bad sectors, filesystem journal corruption, or storage subsystem failures can corrupt the filestore index without affecting the message blocks (or vice versa). ZFS or ext4 journal recovery after a crash may restore data blocks but miss metadata updates.
Bug in filestore accounting during message deletion. Certain sequences of message deletions — particularly under high concurrency with max_msgs_per_subject policies — have historically triggered edge cases in the subject tracking logic. Patches for known cases exist in newer server versions.
Manual tampering with filestore data directory. Directly modifying, moving, or restoring individual files within the stream’s data directory bypasses the server’s internal consistency checks. Restoring a partial backup (e.g., message blocks without the corresponding index files) is a common trigger.
Raft snapshot application failure. During catchup, a follower applies a snapshot from the leader. If the snapshot application partially fails — due to disk space, permissions, or I/O errors — the resulting filestore state may be internally inconsistent.
nats stream info STREAM_NAME --json | jq '{messages: .state.messages, subjects: .state.num_subjects}'If num_subjects is greater than messages, the invariant is violated. For example:
1{2 "messages": 1247,3 "subjects": 38914}This stream claims 3,891 unique subjects but only 1,247 total messages — clearly impossible.
For replicated streams, compare each replica:
nats stream info STREAM_NAME --allLook at each replica’s state section. If only one replica shows the inconsistency, the corruption is isolated and recovery is simpler.
# Look for filestore-related warnings around the time of last restartnats server request log --filter "filestore" --last 24hCommon log patterns include:
1[ERR] JetStream filestore error recovering subject state2[WRN] Stream STREAM_NAME subject count mismatch after recovery# Find the stream's data directorynats stream info STREAM_NAME --json | jq -r '.config.name'
# On the server, check for zero-byte or truncated index filesls -la /path/to/jetstream/$ACCOUNT/streams/STREAM_NAME/1package main2
3import (4 "fmt"5 "log"6
7 "github.com/nats-io/nats.go"8)9
10func main() {11 nc, _ := nats.Connect(nats.DefaultURL)12 js, _ := nc.JetStream()13
14 for name := range js.StreamNames() {15 info, err := js.StreamInfo(name)16 if err != nil {17 log.Printf("error getting info for %s: %v", name, err)18 continue19 }20 if info.State.NumSubjects > info.State.Msgs {21 fmt.Printf("CRITICAL: stream %s has %d subjects but only %d messages\n",22 name, info.State.NumSubjects, info.State.Msgs)23 }24 }25}Step the leader away from the corrupted replica. If you’ve identified which replica has the corruption and it’s currently the leader, force a leader election to move leadership to a healthy replica:
nats stream cluster step-down STREAM_NAMEDisable auto-recovery temporarily. If the stream is R3 and one replica is corrupted, the cluster still has quorum with two healthy replicas. Do not remove the corrupted replica yet — that would reduce you to R1, losing fault tolerance.
Option 1: Force replica rebuild (replicated streams). If the corruption is on one replica and others are healthy, remove the corrupted replica’s peer and let the cluster rebuild it from the leader:
# Identify the corrupted peernats stream info STREAM_NAME --all
# Remove the corrupted peer — NATS will automatically add a replacementnats stream cluster peer-remove STREAM_NAME PEER_NAMEThe cluster will provision a new replica on an available server and replicate the leader’s (healthy) data.
Option 2: Backup and recreate (single replica or all-corrupt). If all replicas show the inconsistency or the stream is R1:
# Back up the stream's messages (this reads from the stream, not the corrupted index)nats stream backup STREAM_NAME /tmp/stream-backup/
# Record the stream configurationnats stream info STREAM_NAME --json > /tmp/stream-config.json
# Delete and recreatenats stream delete STREAM_NAME -fnats stream add --config /tmp/stream-config.json
# Restore messagesnats stream restore STREAM_NAME /tmp/stream-backup/After restore, verify the invariant:
nats stream info STREAM_NAME --json | jq '{messages: .state.messages, subjects: .state.num_subjects}'Upgrade the NATS server. Filestore accounting bugs have been fixed across multiple releases. Run the latest stable version to benefit from all known fixes.
Use graceful shutdowns. Always use SIGTERM to stop NATS servers, giving the filestore time to flush pending writes and complete compaction. Never use SIGKILL in production unless the process is truly hung.
Monitor disk health. Use SMART monitoring, filesystem checksums (ZFS, btrfs), or ECC memory to detect hardware-level corruption before it manifests as filestore inconsistency.
Never manually modify filestore directories. All stream operations — backup, restore, purge, delete — should go through the NATS API or CLI. Direct filesystem manipulation bypasses internal consistency guarantees.
No. The corrupted state is persisted on disk and survives server restarts. The server recovers the filestore from the on-disk state, so if the written state is inconsistent, the recovered state will be too. Manual intervention is required.
Purging removes all messages and resets the subject count to zero, which eliminates the inconsistency. However, this means losing all data in the stream. If the data is valuable, use the backup/restore approach instead — reading messages out through the API uses a different code path than the subject index and should retrieve the actual messages.
If the message data itself is intact and only the subject index is corrupted, consumers can still read and acknowledge messages normally. The subject count is metadata — consumers operate on sequence numbers and individual message reads. However, if the underlying corruption also affected message blocks, consumers may encounter errors on specific sequences.
After recovery, monitor the stream’s subject and message counts. If the invariant violation recurs, suspect a repeatable trigger — likely a software bug or ongoing hardware issue. Check the server version, review release notes for known filestore fixes, and run disk diagnostics.
JETSTREAM_013 checks for an invariant violation within a single stream — subjects exceeding messages is always wrong. JETSTREAM_014 checks for divergence between replicas — different message counts across replicas that should be identical. Both indicate consistency problems, but JETSTREAM_013 is a more definitive indicator of corruption because the condition is mathematically impossible under correct operation.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community