Checks/SERVER_017

NATS JetStream Storage Pressure: What It Means and How to Fix It

Severity
Warning
Category
Saturation
Applies to
Server
Check ID
SERVER_017
Detection threshold
JetStream storage usage >= 90% of reserved (js_reserved_storage)

A NATS server’s JetStream file-backed storage usage has reached 90% or more of its reserved allocation — stream data on disk is approaching the point where the server will reject new stream creation and may fail writes to existing streams if storage is fully exhausted.

Why this matters

JetStream reserves a fixed amount of disk storage per server, configured via max_storage (or max_store) in the JetStream configuration block. This reservation is the total budget shared across all file-backed streams on that server. When usage crosses 90%, the server is close to its limit. Once storage is fully consumed, the server rejects requests to create new streams, and existing streams that attempt to write beyond their allocated portion will encounter write failures.

The 90% threshold exists to provide a buffer. Disk storage doesn’t degrade gracefully — unlike memory, where the system can swap or compress, a full disk causes hard failures. Raft WAL writes fail, message appends fail, and consumer state updates fail. A server that runs out of JetStream storage doesn’t just slow down — it stops functioning for all JetStream workloads. The 10% remaining at the warning threshold is your window to act before that happens.

Storage pressure is often a slow-building problem. Streams accumulate data over days or weeks, and without retention limits, they grow indefinitely. A stream that was fine at 100GB may quietly grow to 900GB over months. By the time this check fires, multiple streams may be contributing to the pressure, and the fix requires understanding which streams are consuming the most space and whether that usage is intentional.

Common causes

  • Streams without retention limits. Streams configured with no max_age, max_bytes, or max_msgs limit retain messages forever. Over time, they consume unbounded storage. This is the most common cause of storage pressure.

  • Retention limits too generous for the volume. A stream with max_age: 30d receiving 1GB/day will consume 30GB. If the server’s total reservation is 100GB and multiple streams have similarly generous limits, the sum exceeds the budget.

  • Burst ingestion without corresponding limits. A batch import, backfill, or spike in publish rate fills a stream faster than normal. The stream’s steady-state growth was fine, but the burst pushed storage usage over the threshold.

  • Consumer acknowledgments not keeping up. In work-queue streams (retention: workqueue), messages are deleted after acknowledgment. If consumers stop acknowledging — because they crashed, fell behind, or were misconfigured — messages accumulate instead of being cleaned up.

  • Large message payloads. Streams receiving messages with large payloads (images, documents, serialized datasets) consume storage disproportionately. A stream at 1,000 msg/s with 100KB payloads uses 100MB/s — 8.6TB/day.

  • Uncompressed streams. Streams without compression enabled (compression: s2) use significantly more disk space than necessary. Enabling compression can reduce storage usage by 50-80% for typical workloads.

How to diagnose

Check server-level storage usage

Terminal window
nats server report jetstream

This shows each server’s JetStream storage reservation and current usage. The STORAGE column shows reserved capacity, and USED shows actual consumption. Look for servers where usage is near the reservation.

Identify the largest streams

Terminal window
nats stream report

This lists all streams sorted by storage usage. Focus on the top consumers — in most deployments, a handful of streams account for the majority of storage. Note which streams have no retention limits.

Check individual stream configuration and state

Terminal window
nats stream info <stream_name>

Look at:

  • Storage: Total bytes on disk
  • Messages: Total message count
  • First/Last sequence: Age of oldest and newest message
  • Limits: max_age, max_bytes, max_msgs — any set to unlimited?

Check stream growth rate

Terminal window
# Compare stream sizes over time
nats stream report --json | jq '.[] | {name: .name, bytes: .state.bytes}'

Run this at two points in time to calculate the growth rate per stream. This helps prioritize which streams need limits most urgently.

How to fix it

Immediate: free up space

Purge stale or unnecessary data. If a stream contains data that is no longer needed (test data, completed work-queue items, expired events):

Terminal window
# Purge all messages from a stream
nats stream purge <stream_name>
# Purge messages older than a specific sequence
nats stream purge <stream_name> --seq <sequence_number>
# Purge messages on a specific subject within the stream
nats stream purge <stream_name> --subject "events.old.>"

Delete unused streams. Streams that are no longer consumed or serve no purpose should be removed:

Terminal window
nats stream list
nats stream rm <unused_stream_name>

Short-term: add retention limits

Set max_bytes on the largest streams. This caps storage per stream, preventing any single stream from consuming the entire budget:

Terminal window
nats stream edit <stream_name> --max-bytes 50GB

Set max_age for event-type streams. Streams holding time-series events, logs, or telemetry data rarely need to retain messages indefinitely:

Terminal window
nats stream edit <stream_name> --max-age 7d

Enable compression on large streams. S2 compression significantly reduces on-disk storage with minimal CPU overhead:

Terminal window
nats stream edit <stream_name> --compression s2
1
// Go - check storage usage and apply limits
2
nc, _ := nats.Connect(url)
3
js, _ := nc.JetStream()
4
5
// List streams and check storage
6
for name := range js.StreamNames() {
7
info, _ := js.StreamInfo(name)
8
usedGB := float64(info.State.Bytes) / (1024 * 1024 * 1024)
9
if usedGB > 10 {
10
log.Printf("Stream %s using %.1f GB", name, usedGB)
11
}
12
}
13
14
// Update a stream to add retention limits
15
cfg := &nats.StreamConfig{
16
Name: "ORDERS",
17
MaxBytes: 50 * 1024 * 1024 * 1024, // 50 GB
18
MaxAge: 7 * 24 * time.Hour, // 7 days
19
Compression: nats.S2Compression,
20
}
21
js.UpdateStream(cfg)

Long-term: prevent recurrence

Establish a stream provisioning policy. Require all new streams to specify at least one retention limit (max_bytes, max_age, or max_msgs). Use account-level limits to enforce a storage budget per tenant:

1
# nats-server.conf - account-level JetStream limits
2
accounts: {
3
PRODUCTION: {
4
jetstream: {
5
max_store: 200GB
6
max_streams: 50
7
}
8
}
9
}

Implement capacity planning. Calculate expected storage growth based on message rates, payload sizes, and retention windows. Alert when projected usage (current growth rate × time to next capacity review) will exceed 80%.

Right-size the server’s storage reservation. If the server’s disk has capacity, increasing max_store is appropriate. But if streams are growing without limits, increasing the reservation just delays the same problem:

nats-server.conf
1
jetstream: {
2
max_store: 2TB
3
}

Use Synadia Insights for continuous monitoring. Insights evaluates storage pressure every collection epoch, tracking trends over time and alerting before you reach critical levels. Combined with the Streams Without Limits check (OPT_SYS_001), it identifies which streams need retention policies before they cause storage pressure.

Frequently asked questions

What is the difference between SERVER_017 and SERVER_019?

SERVER_017 (this check) fires when storage usage reaches 90% of js_reserved_storage — the sum of storage allocations reserved by streams. SERVER_019 fires when actual disk usage approaches max_store — the hard filesystem limit configured for JetStream. Think of SERVER_017 as “your stream reservations are nearly full” and SERVER_019 as “the actual disk is nearly full.” SERVER_019 is more dangerous because exceeding max_store causes Raft WAL write failures.

Will JetStream stop accepting writes at 90%?

No. The 90% threshold is a warning, not a hard limit. JetStream continues accepting writes until storage is fully consumed. At 100% of the reservation, behavior depends on whether individual streams have per-stream max_bytes limits. Streams with limits will enforce them (oldest messages purged). Streams without limits will fail writes once no space remains.

Can I increase the storage reservation without a restart?

No. Changes to max_store in the server configuration require a server restart to take effect. A configuration reload (nats-server --signal reload) does not change JetStream resource reservations. Plan capacity changes during maintenance windows.

How does replication affect storage calculations?

Each replica of a stream consumes storage on the server it runs on. An R3 stream with 100GB of data uses 100GB on three different servers — 300GB total across the cluster. The storage pressure check evaluates per-server usage, so it correctly accounts for the local replica’s consumption without double-counting replicas on other servers.

Should I set max_bytes or max_age on streams?

Both, ideally. max_age ensures messages expire after a time window regardless of volume. max_bytes caps the total size regardless of message rate. Together, they prevent both slow accumulation (addressed by max_age) and burst-driven spikes (addressed by max_bytes). For work-queue streams, max_bytes is most important since messages are deleted after acknowledgment and max_age may never trigger.

Proactive monitoring for NATS jetstream storage pressure with Synadia Insights

With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.

Start a 14-day Insights trial
Cancel