NATS Raft IPQ Backpressure: What It Means and How to Fix It

Raft IPQ backpressure occurs when internal queue lengths for a NATS JetStream Raft group exceed their depth threshold, indicating a processing backlog. The Raft leader or follower cannot process proposals, entries, responses, or apply operations fast enough to keep up with the incoming workload.

Why this matters

Every JetStream stream and consumer replica in NATS is backed by a Raft consensus group. Within each Raft group, several internal queues (collectively called IPQs — internal proposal queues) buffer work between stages of the consensus pipeline. Proposals queue client write requests waiting for the leader to batch and replicate. Entries queue replicated log entries waiting to be written. Responses queue peer acknowledgments waiting to be processed. Apply queues hold committed entries waiting to be applied to the local state machine.

When any of these queues grows beyond the threshold, the Raft group is signaling that one stage of its pipeline is a bottleneck. Unlike Raft apply lag (OPT_SYS_007), which specifically measures the gap between committed and applied entries, IPQ backpressure covers the entire Raft pipeline — from initial proposal to final application. A backed-up proposal queue means the leader can’t batch and replicate fast enough. A backed-up entry queue means replicated entries are arriving faster than the server can write them. Each queue type points to a different bottleneck.

The operational consequences are immediate: publish acknowledgments slow down because proposals wait in queue, consumer ack state updates lag behind reality, and leader elections during failover take longer because the server must drain its queues before it can participate in a new election. In multi-stream deployments, IPQ backpressure on one Raft group can cascade — all Raft groups on a server share the same disk I/O and CPU, and one group’s I/O storm raises latency for every other group on that server.

Common causes

Disk I/O saturation. Raft requires durable writes (fsync) at multiple pipeline stages. Slow disks — network-attached storage, contended shared volumes, or drives without power-loss protection — bottleneck the entry write and apply stages, causing entries and applies to queue up while waiting for I/O.
Excessive Raft groups per server. Each stream replica and consumer replica is a separate Raft group with its own set of IPQs. A server hosting thousands of Raft groups generates aggregate I/O and CPU load that overwhelms any single bottleneck point. This is the single most common cause and correlates directly with the High HA Assets check (CLUSTER_003).
High sustained write rate. Streams receiving thousands of messages per second generate a continuous flow of Raft proposals. If the publish rate exceeds what the leader can batch, replicate, and commit, the proposal queue grows. This is especially acute when multiple high-throughput streams share a leader.
Uneven leader distribution. If one server holds a disproportionate number of Raft group leaderships, it bears the full proposal and replication load for all of them. Followers on the same cluster may be idle while the leader’s IPQs overflow. The Uneven Leader Distribution check (OPT_BALANCE_001) detects this condition.
Network latency between cluster peers. The Raft protocol requires round-trips between leader and followers for replication. High RTT between peers extends the time entries sit in queues waiting for acknowledgment, increasing effective queue depth even when local I/O is fast.
CPU contention. Raft proposal batching, log entry serialization, and response processing all consume CPU. When the server is CPU-bound — handling high client connection counts, subject routing, or TLS termination — Raft processing gets less CPU time and queues accumulate.

How to diagnose

Check for IPQ warnings in server logs

The NATS server logs warnings when IPQ depths exceed internal thresholds:

journalctl -u nats-server --since "1 hour ago" | grep -i "ipq\|proposal\|backpressure"

Inspect Raft group status

Check the JetStream meta group and per-stream Raft group health:

nats server report jetstream

Look for servers with elevated lag, pending operations, or stalled Raft groups. Servers with IPQ backpressure typically show symptoms across multiple groups simultaneously.

Check per-stream Raft state

For specific streams showing degraded performance:

nats stream info <stream_name>

Examine the cluster section — replicas showing lag or “not current” state may have IPQ backpressure on their servers.

Measure disk I/O performance

Since disk I/O is the most common bottleneck:

# Linux — check disk latency and utilization
iostat -xz 1 5

# Look for: await > 5ms on SSDs, %util > 85%

Count Raft groups per server

Determine if the server is overloaded:

nats server report jetstream --json | jq '.[] | {name: .name, ha_assets: .ha_assets, streams: .streams, consumers: .consumers}'

Servers with 1,000+ HA assets on non-NVMe storage are prime candidates for IPQ backpressure.

Check leader distribution

Verify that leadership isn’t concentrated:

nats server report jetstream

Compare the leader count across servers. If one server has 2x+ the leaders of its peers, proposal queue pressure on that server is amplified.

How to fix it

Immediate: relieve pressure on the affected server

Step down leaderships from the overloaded server. Redistributing leaders reduces proposal queue pressure on the hot server:

# Step down leaders for specific streams
nats stream cluster step-down <stream_name>

Reduce publish rate if possible. If a specific high-throughput stream is the primary source of proposals, temporarily reducing publish rate gives the queues time to drain. This is a band-aid, not a fix.

Short-term: address the I/O bottleneck

Upgrade to NVMe storage. Raft’s fsync requirements make storage latency the dominant factor in IPQ throughput. NVMe drives with sub-100μs write latency handle thousands of concurrent Raft groups; network-attached storage adds 1-5ms per operation:

1
# Server configuration — dedicated fast storage
2
jetstream {
3
    store_dir: "/data/jetstream"    # NVMe-backed volume
4
    max_file_store: 200G
5
}

Reduce Raft groups per server. Move stream replicas to other servers using placement tags:

1
// Go — place stream on servers with capacity
2
_, err := js.AddStream(&nats.StreamConfig{
3
    Name:      "EVENTS",
4
    Subjects:  []string{"EVENTS.>"},
5
    Replicas:  3,
6
    Placement: &nats.Placement{
7
        Tags: []string{"low-raft-density"},
8
    },
9
})

1
// TypeScript (nats.js)
2
import { connect, AckPolicy } from "nats";
3

4
const nc = await connect({ servers: "nats://localhost:4222" });
5
const jsm = await nc.jetstreamManager();
6

7
await jsm.streams.add({
8
    name: "EVENTS",
9
    subjects: ["EVENTS.>"],
10
    num_replicas: 3,
11
    placement: {
12
        tags: ["low-raft-density"],
13
    },
14
});

Long-term: architect for sustainable Raft performance

Separate JetStream workloads from core NATS routing. Dedicated JetStream servers avoid CPU and I/O contention between client routing and Raft operations:

1
# Dedicated JetStream server — no client-facing traffic
2
listen: "0.0.0.0:4222"
3
jetstream {
4
    store_dir: "/data/jetstream"
5
    max_file_store: 500G
6
    max_mem_store: 2G
7
}
8
server_tags: ["jetstream-dedicated"]

Consolidate small streams. Hundreds of single-subject streams create hundreds of Raft groups. Consolidating related subjects into fewer multi-subject streams dramatically reduces the total Raft group count and aggregate IPQ depth:

# Instead of: ORDERS.us, ORDERS.eu, ORDERS.asia (3 streams, 3 Raft groups)
# Use: ORDERS with subjects ["ORDERS.>"] (1 stream, 1 Raft group)

Implement capacity planning for Raft groups. Establish a per-server Raft group budget based on storage performance. As a guideline: NVMe can sustain 2,000-5,000 Raft groups, standard cloud SSDs 500-1,000, and network-attached HDD should not be used for JetStream in production. Monitor IPQ depth as part of regular capacity reviews.

Frequently asked questions

What is the difference between IPQ backpressure and Raft apply lag?

Raft apply lag (OPT_SYS_007) specifically measures the gap between committed and applied entries — the final stage of the Raft pipeline. IPQ backpressure covers all four internal queues: proposals, entries, responses, and applies. A server can have IPQ backpressure on the proposal queue (leader can’t batch fast enough) with zero apply lag (once committed, entries apply quickly). IPQ backpressure is a broader signal; apply lag is a specific subset of it.

Which IPQ queue matters most?

The apply queue is most critical — it means the upper layer (JetStream) cannot consume committed entries fast enough. When the apply queue backs up, stream state falls behind committed Raft state, causing visible lag. Beyond the apply queue, it depends on the server’s role. On leaders, the proposal queue is critical — it directly affects publish latency. On followers, the entry queue matters — it affects replication lag and catch-up time. The four queue types are: proposals, append entries, apply, and responses. In practice, if one queue is backed up, others tend to follow because they share the same disk and CPU resources. The check fires when any single queue exceeds the threshold.

Can IPQ backpressure cause message loss?

No. Raft’s consensus protocol ensures that committed entries are durable across a quorum of replicas. IPQ backpressure causes latency — proposals wait longer, acks take longer, apply state falls behind — but no committed data is lost. However, if publish clients have tight timeouts and the proposal queue delay exceeds those timeouts, clients may see publish failures and retry, increasing the load further.

How does IPQ backpressure relate to the meta cluster?

The meta cluster is itself a Raft group. If the meta cluster experiences IPQ backpressure (also flagged by Meta Pending High, META_008), JetStream API operations — stream creation, consumer creation, leader elections — slow down cluster-wide. This is separate from individual stream/consumer Raft groups, but the root causes (disk I/O, CPU) are the same.

What is a safe IPQ depth for production?

The default threshold of 1,000 is a reasonable starting point. In steady state, IPQ depths should be in the low hundreds or less. Depths consistently above 500 indicate the server is approaching its throughput ceiling. Brief spikes above 1,000 during traffic bursts are acceptable if they drain quickly (within seconds). Sustained depths above 1,000 require intervention — the server cannot keep up with its current workload.

FEATURED

RESOURCES

Comparisons