Checks/OPT_COST_005

NATS Wasted JetStream Storage Reservation: Reclaiming Unused Capacity

Severity
Info
Category
Consistency
Applies to
JetStream
Check ID
OPT_COST_005
Detection threshold
JetStream storage usage below 20% of reserved capacity on a server

A wasted JetStream storage reservation alert fires when a server’s actual JetStream storage usage is below 20% of its reserved (configured) capacity. The server has been allocated disk resources — either through explicit max_storage configuration or infrastructure provisioning — that it isn’t using. Those resources cost money (disk, IOPS, cloud storage tiers) while delivering no value.

Why this matters

You’re paying for what you don’t use. In cloud deployments, storage is billed by the GB. A server with 1 TB reserved but only 100 GB used is wasting 900 GB of billable storage. Across a multi-server cluster, this compounds quickly. Even in on-premises deployments, over-provisioned storage represents capital locked up in disks that could serve other workloads.

Over-reservation masks capacity planning problems. When servers appear to have plenty of headroom, teams stop thinking about storage efficiency. Streams accumulate without limits, old data persists indefinitely, and nobody notices because the utilization dashboard stays green. The waste becomes structural — baked into every new server provisioned using the same template.

It distorts placement decisions. JetStream’s placement engine considers available storage when deciding where to place new stream replicas. A server with 900 GB “free” (but reserved) looks attractive for placement, even though the reserved capacity may have been intended for future growth. This can lead to uneven actual utilization — some servers over-committed, others sitting nearly empty.

Right-sizing creates room for real optimization. Reclaiming wasted reservations lets you either reduce infrastructure costs (smaller disks, fewer servers) or reallocate capacity to servers that actually need it. Either way, you’re aligning cost with value.

Common causes

  • Copy-paste server configuration. All servers in the cluster use the same max_storage value, regardless of their actual workload. A server provisioned for a high-write workload that was later moved to a different server retains its large reservation.

  • Streams that were removed but reservation wasn’t adjusted. A large stream was deleted or migrated to another server, freeing up significant storage, but nobody reduced the server’s max_storage to reflect the new reality.

  • Generous initial provisioning. Servers were provisioned with large reservations “just in case” during initial deployment. The expected growth never materialized, but the reservation was never revisited.

  • Inactive or nearly-empty streams. Streams exist on the server but contain very little data — either because they’re rarely published to, or because aggressive TTLs keep the stored volume low. The reservation was sized for peak expected volume that never arrived.

  • Cloud storage tier mismatch. High-performance (and expensive) storage was provisioned for a workload that turned out to need much less capacity. The storage tier is overkill for the actual I/O pattern.

How to diagnose

Check per-server storage utilization

Terminal window
# Report JetStream resource usage per server
nats server report jetstream

Look for servers where the Storage used column is a small fraction of the Storage reserved column. Any server below 20% utilization triggers this check.

Get detailed storage breakdown

Terminal window
# List streams on a specific server with their sizes
nats stream report
# Check a specific server's JetStream configuration
nats server info server-2 --json | jq '.jetstream'

Calculate waste across the cluster

1
package main
2
3
import (
4
"context"
5
"fmt"
6
"log"
7
8
"github.com/nats-io/nats.go"
9
"github.com/nats-io/nats.go/jetstream"
10
)
11
12
func main() {
13
nc, _ := nats.Connect(nats.DefaultURL)
14
js, _ := jetstream.New(nc)
15
16
info, err := js.AccountInfo(context.Background())
17
if err != nil {
18
log.Fatal(err)
19
}
20
21
used := info.Store
22
limit := info.Limits.MaxStore
23
pct := float64(used) / float64(limit) * 100
24
wasted := limit - int64(used)
25
26
fmt.Printf("Storage Used: %d MB\n", used/1024/1024)
27
fmt.Printf("Storage Reserved: %d MB\n", limit/1024/1024)
28
fmt.Printf("Utilization: %.1f%%\n", pct)
29
fmt.Printf("Wasted: %d MB\n", wasted/1024/1024)
30
}

Review the monitoring endpoint

Terminal window
# Check JetStream stats via the monitoring endpoint
curl -s http://localhost:8222/jsz | jq '{
storage_used: .storage,
reserved_storage: .reserved_storage,
utilization_pct: ((.storage / .reserved_storage) * 100)
}'

How to fix it

Immediate: identify and quantify the waste

Map streams to servers. Understand which streams live on each server and how much storage they consume. This tells you whether the waste is because streams are small or because the server has few streams:

Terminal window
# Show streams with their placement and sizes
nats stream report

Calculate the right reservation. A good heuristic: reserve 1.5–2x the current peak usage to allow for growth, plus headroom for replication traffic and compaction overhead. If a server’s peak usage over the past 90 days was 200 GB, a 400 GB reservation is reasonable. A 2 TB reservation is waste.

Short-term: right-size the reservation

Reduce max_storage in the server configuration. Update the JetStream storage limit to match actual needs plus reasonable growth headroom:

nats-server.conf
1
jetstream {
2
store_dir: /data/jetstream
3
max_storage: 500GB # was 2TB, actual usage is 150GB
4
}

Reload the configuration:

Terminal window
nats server config reload <server-id>

Downsize the underlying storage. In cloud environments, resize the attached disk volume to match the new reservation. This is where the actual cost savings materialize:

Terminal window
# Example: AWS EBS volume resize (via AWS CLI)
aws ec2 modify-volume --volume-id vol-xxx --size 500

Short-term: consolidate underutilized servers

Migrate streams from underutilized servers. If a server has very few streams and low utilization, consider moving those streams to other servers and decommissioning the empty one:

Terminal window
# Move a stream to a different set of servers using placement tags
nats stream edit ORDERS --tag region=us-east --tag role=storage

After migrating all streams off a server, remove it from the cluster to eliminate the wasted reservation entirely.

Long-term: implement capacity planning

Use tiered reservation templates. Instead of one-size-fits-all server configs, define templates for different workload profiles — small (100 GB), medium (500 GB), large (2 TB) — and assign servers to the appropriate tier based on their actual workload.

Review reservations quarterly. Add a recurring review of JetStream storage utilization to your operational calendar. Compare reserved vs. actual usage across all servers and right-size as needed.

Monitor with Synadia Insights. Insights flags servers where utilization drops below the threshold, making it easy to spot waste as it develops rather than discovering it during periodic reviews.

Frequently asked questions

Is some over-provisioning expected and healthy?

Yes. You should always have headroom for traffic spikes, stream growth, and operational overhead (compaction, snapshots). The 20% utilization threshold for this check means the server is using less than a fifth of its reserved capacity — that’s well beyond reasonable headroom. A healthy target is 40–70% utilization, leaving 30–60% for growth and operations.

Will reducing the reservation cause problems if usage grows?

Only if you reduce it too aggressively. Size the reservation based on projected growth, not just current usage. If a stream is growing at 10 GB/month and currently uses 100 GB, a 200 GB reservation gives you ~10 months of runway. Set calendar reminders to re-evaluate before you hit 80% utilization.

Can I redistribute storage across servers instead of reducing it?

Yes. If some servers in your cluster are over-provisioned while others are near capacity, you can migrate streams from saturated servers to underutilized ones. This balances utilization without changing total cluster capacity. Use placement tags and nats stream edit to control where streams land.

Does this check account for replication overhead?

The check compares actual stored bytes (including replicas hosted on the server) against the server’s max_storage limit. Replication overhead is already reflected in the “used” number. If a server hosts R3 stream replicas, each replica’s storage counts toward usage.

What about storage reserved for future streams that haven’t been created yet?

If you have concrete plans for new streams, factor their expected storage into your reservation sizing. But “we might need it someday” is not a good reason to maintain a 5x over-provision. Provision for known workloads with reasonable growth buffers, and expand when the need actually materializes. Cloud storage can be resized in minutes.

Proactive monitoring for NATS wasted jetstream storage reservation with Synadia Insights

With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.

Start a 14-day Insights trial
Cancel