Checks/OPT_SYS_007

NATS Raft Apply Lag: What It Means and How to Fix It

Severity
Warning
Category
Performance
Applies to
System Improvement
Check ID
OPT_SYS_007
Detection threshold
Committed minus applied Raft entries exceeds 100

Raft apply lag occurs when a NATS server has committed Raft log entries (agreed upon by a quorum of peers) but has not yet applied them to the local state machine. A gap between committed and applied entries means the server’s local state is stale — it has acknowledged entries as durable but hasn’t finished processing them into the actual stream or consumer state.

Why this matters

In NATS JetStream, every stream and consumer replica is backed by a Raft group. The Raft protocol ensures that entries (published messages, consumer acks, metadata changes) are committed to a majority of replicas before being considered durable. Once committed, each replica independently applies the entry to its local state machine — writing the message to its storage engine, updating consumer delivery state, or modifying configuration.

Apply lag means the “apply” step is falling behind the “commit” step. The entry is durable (a majority of replicas have it in their Raft log), but the local server hasn’t finished processing it into usable state. During this lag, the server’s view of the stream or consumer is behind reality. Reads from this server may return stale data. Consumer deliveries from this server may re-deliver messages that have already been acknowledged on the leader. In the worst case, if this server needs to become the leader (because the current leader fails), it must first apply all pending entries before it can serve requests — extending failover time from milliseconds to seconds or more.

The impact scales with the number of Raft groups on a server. A server hosting 500 streams and 2,000 consumers has 2,500 Raft groups, each generating apply operations. Every published message, every consumer ack, every heartbeat produces entries that need to be applied. When the server’s disk I/O or CPU can’t keep up with the aggregate apply rate across all groups, lag accumulates across multiple groups simultaneously, and the server’s overall JetStream performance degrades.

Common causes

  • Disk I/O bottleneck. Applying Raft entries requires writing to the storage engine, which involves fsync calls to ensure durability. Slow disks — network-attached storage, heavily contended shared volumes, or spinning disks — create a bottleneck at the apply step. Each fsync blocks the apply pipeline for that Raft group, and when thousands of groups are fsyncing simultaneously, disk queue depth spikes.

  • Too many Raft groups on a single server. Each stream replica and consumer replica is a separate Raft group with its own apply queue. A server with 1,000+ Raft groups generates massive aggregate I/O even if each individual group’s apply rate is modest. This is the most common cause in large JetStream deployments and is flagged by the High HA Assets check (CLUSTER_003).

  • CPU contention. Applying entries involves deserialization, state machine updates, index maintenance, and acknowledgment processing. On CPU-bound servers — especially those also handling high client message rates — the apply goroutines may not get enough CPU time to keep up.

  • Large message payloads. Applying entries for streams with large messages (>100KB) is more expensive than small messages. The storage engine must write more bytes per entry, and any compression or checksumming adds proportional CPU cost. A stream receiving 1MB messages at 100 msg/s generates 100 MB/s of apply I/O for that single Raft group.

  • Burst of write activity. A sudden spike in publish rate (batch imports, backfills, traffic surges) can cause a temporary apply lag spike. The commit pipeline handles the burst through Raft replication, but the apply pipeline — constrained by local disk I/O — falls behind until the burst subsides.

  • Storage engine compaction or garbage collection. Periodic maintenance operations in the storage engine (compacting LSM levels, removing expired messages, reclaiming disk space) compete with apply operations for disk I/O. During compaction, apply throughput drops and lag increases.

How to diagnose

Check Raft group health

View the meta group (cluster-wide) apply state:

Terminal window
nats server report jetstream

Look for the Apply and Commit values in the Raft group information. Any server where Commit significantly exceeds Applied has apply lag.

Check specific stream replica lag

For a specific stream, inspect the per-replica state:

Terminal window
nats stream info <stream_name>

The replica section shows each peer’s state. Replicas with apply lag will show entries that are committed but not yet reflected in the stream’s sequence numbers.

Monitor disk I/O on the affected server

Apply lag is almost always related to disk performance. Check I/O metrics:

Terminal window
# Linux — check disk I/O wait and queue depth
iostat -xz 1 5
# macOS
iostat -d 1 5

Key metrics:

  • await — Average I/O wait time. Above 10ms on SSDs indicates contention.
  • %util — Disk utilization. Sustained 90%+ means the disk is the bottleneck.
  • avgqu-sz — Average queue depth. High values mean I/O requests are waiting.

Count Raft groups per server

Determine whether the server is overloaded with Raft groups:

Terminal window
nats server report jetstream --json | jq '.[] | {name: .name, streams: .streams, consumers: .consumers, ha_assets: .ha_assets}'

Servers with over 1,000 HA assets (streams + consumers with replicas) are at elevated risk for apply lag, especially on anything other than high-performance NVMe storage.

Check for I/O-intensive operations

Look for signs of storage compaction or stream purge operations that may be competing with applies:

Terminal window
# Check server logs for compaction events
journalctl -u nats-server --since "1 hour ago" | grep -i "compact\|purge\|snapshot"

How to fix it

Immediate: reduce apply pressure

Identify and address the hottest Raft groups. If one or a few streams are responsible for the majority of apply I/O (high publish rate, large messages), they’re the priority:

Terminal window
# Find streams with the highest message rates
nats stream report

If a single stream is responsible for the majority of writes, consider moving it to a dedicated set of servers using placement tags.

Reduce concurrent fsync pressure. If the server hosts many low-throughput Raft groups that collectively saturate disk I/O, the solution is spreading groups across more servers, not tuning individual groups.

Short-term: improve disk I/O capacity

Use NVMe SSDs. Raft apply performance is dominated by fsync latency. NVMe drives with consistent sub-100μs write latency handle thousands of concurrent Raft group applies without bottlenecking. Network-attached storage (EBS gp3, Azure Standard SSD) typically adds 1-5ms per fsync, which becomes the limiting factor at scale.

Separate JetStream storage from the OS disk. Ensure JetStream data is on a dedicated volume, not sharing I/O bandwidth with OS operations, logging, or other applications:

1
# Server configuration — dedicated storage path
2
jetstream {
3
store_dir: "/data/jetstream" # Dedicated NVMe volume
4
max_memory_store: 1G
5
max_file_store: 100G
6
}

Monitor and tune the filesystem. For Linux, ensure the filesystem is mounted with appropriate options for write-heavy workloads:

Terminal window
# Check current mount options
mount | grep /data/jetstream
# Recommended: ext4 or xfs with noatime
# Example fstab entry:
# /dev/nvme1n1 /data/jetstream ext4 defaults,noatime 0 2

Long-term: distribute the Raft group load

Reduce the number of Raft groups per server. This is the most impactful long-term fix. Use placement tags to spread streams and consumers across more servers:

1
// Go — create stream with placement constraints
2
_, err := js.AddStream(&nats.StreamConfig{
3
Name: "ORDERS",
4
Subjects: []string{"ORDERS.>"},
5
Replicas: 3,
6
Placement: &nats.Placement{
7
Tags: []string{"jetstream-dedicated"},
8
},
9
})
1
# Python — stream with placement
2
from nats.js.api import StreamConfig, Placement
3
4
await js.add_stream(StreamConfig(
5
name="ORDERS",
6
subjects=["ORDERS.>"],
7
num_replicas=3,
8
placement=Placement(tags=["jetstream-dedicated"]),
9
))

Consolidate small streams. If you have hundreds of single-subject streams that could share a multi-subject stream, consolidation reduces the total Raft group count. Ten streams with one subject each create 10 Raft groups; one stream with 10 subjects creates 1.

Use dedicated JetStream servers. Separate servers that handle JetStream (Raft groups, storage I/O) from servers that handle core NATS routing (client connections, subject routing). This prevents client traffic from competing with Raft apply operations for CPU and I/O:

1
# Dedicated JetStream server
2
jetstream {
3
store_dir: "/data/jetstream"
4
max_file_store: 500G
5
}
6
7
# Tag for placement
8
server_tags: ["jetstream-dedicated"]

Frequently asked questions

What is the difference between Raft apply lag and stream replica lag?

Stream replica lag (JETSTREAM_001) measures how far a replica’s last sequence number is behind the leader — it reflects replication delay between servers. Raft apply lag is local to a single server — it measures the gap between entries the server has committed to its Raft log and entries it has applied to the state machine. A server can have zero replica lag (it received all entries via Raft replication) but high apply lag (it hasn’t finished processing those entries into the storage engine).

How many Raft groups can a single NATS server handle?

It depends entirely on disk I/O capacity and aggregate write rate. On NVMe storage with low write rates per group, a server can handle 2,000-5,000 Raft groups. On network-attached storage (cloud provider standard SSDs), 500-1,000 groups is a practical ceiling before apply lag starts appearing. The CLUSTER_003 check flags servers with more than 1,000 HA assets as a warning.

Does apply lag cause message loss?

No. Apply lag means entries are committed (durable across a quorum) but not yet processed locally. The data is safe. However, if the lagging server needs to become leader during a failover, it must apply all pending entries first, which extends the failover time. During apply lag, reads from the lagging server may also return slightly stale data.

Can I tune Raft apply behavior in NATS?

NATS does not expose direct tuning parameters for Raft apply rates. The apply pipeline is bounded by local disk I/O and CPU. The effective tuning levers are: faster storage (NVMe), fewer Raft groups per server (redistribution), and reducing per-group write rates (consolidating streams, reducing publish rates). These infrastructure and architectural changes have far more impact than any configuration parameter.

Why does apply lag spike during JetStream stream purges?

Purging a stream generates Raft entries that must be committed and applied like any other operation. A large purge (millions of messages) creates a burst of apply operations that compete with normal message applies for disk I/O. Additionally, the storage engine must delete the purged data, which generates its own I/O. Schedule large purges during low-traffic periods and monitor apply lag during the operation.

Proactive monitoring for NATS raft apply lag with Synadia Insights

With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.

Start a 14-day Insights trial
Cancel