Why Is My JetStream Leader Using So Much RAM?

A common community question is why a NATS JetStream node, especially a stream or consumer leader, can show high RAM usage even after clients stop consuming or publishing.

The short answer: JetStream uses memory for more than active client connections. Some memory is used for stream and consumer state, some for temporary file-store caching, some for duplicate-message tracking, and some can be related to the node’s leadership role in the cluster. If the usage looks unexpectedly high, the most useful next step is usually to capture a memory profile while the condition is happening.

Common JetStream RAM Consumers

JetStream memory usage depends on configuration and workload. Common contributors include the following.

Message Deduplication State

JetStream supports publish deduplication using the Nats-Msg-Id header. To do that, it keeps duplicate-tracking state for the configured duplicate window.

The default duplicate window — two minutes — is commonly appropriate, but it can be changed or disabled on a per-stream basis. Reducing or disabling the window may reduce memory usage only if you are actually publishing with Nats-Msg-Id values and relying on JetStream deduplication.

If your publishers do not use Nats-Msg-Id, JetStream is not maintaining message-ID deduplication entries for those publishes, so disabling deduplication is unlikely to change memory use meaningfully.

The tradeoff is important: if you disable or shrink the duplicate window, you reduce the server-side protection against duplicate publishes during retries or reconnects.

File-Store Caches

Disk-backed streams still use memory. JetStream’s file store can hold caches in memory for some time before releasing them. That means memory may not immediately drop just because publishers or consumers have stopped.

Cache pressure is more likely when workloads repeatedly scan large ranges of stream data, create many short-lived consumers, or otherwise access historical messages in patterns that cause file-store data to be read and cached.

Per-Subject Tracking

The number of messages in a stream is not always the main memory signal, especially for file-backed streams. The number of unique subjects can also matter because JetStream tracks per-subject information in memory.

A stream with hundreds of millions of messages on a single subject is different from a stream with hundreds of millions of messages spread across a very large subject space. If memory usage is surprising, check the subject cardinality as well as the message count.

Stream, Consumer, and Cluster Leadership State

In a clustered deployment, leaders do more work. A node that is the leader for streams or consumers may use more memory than followers for the same assets.

The cluster metadata leader can also be involved in cluster-wide stream and consumer metadata. If one node is the metadata leader and is using substantially more memory than peers, that fact is worth noting during investigation, but it does not by itself explain every high-memory case.

Why Memory May Stay High After Clients Stop

Stopping clients removes current client activity, but it does not necessarily remove all memory pressure immediately. For example:

file-store caches may remain resident for a period of time;
deduplication entries remain for the duplicate window;
stream, consumer, subject, and cluster metadata remain loaded;
the Go runtime may not return memory to the operating system immediately in the way an operator expects from process RSS alone.

If you are using GOMEMLIMIT, remember that it is a Go runtime soft memory limit, not a simple explanation for why memory is being retained. Seeing a process approach a configured limit is a reason to inspect a memory profile, not a reason to assume the largest streams are necessarily responsible.

Message Count Alone Is Often Not Enough

For file-backed streams, the number of retained messages can be less important than the shape of the workload.

Useful questions include:

Are the streams memory-backed or file-backed?
How many streams and consumers exist on the node and in the cluster?
Which streams and consumers is this node leading?
Is the node also the metadata leader?
How many unique subjects are present in the large streams?
Are clients scanning streams from the beginning, or mostly consuming current work?
Are consumers long-lived, or are many short-lived consumers being created?
Are publishers using Nats-Msg-Id?
What duplicate window is configured on each stream?

For example, a file-backed WorkQueue-style stream with one long-lived consumer and one subject has a very different memory profile from a stream with many unique subjects and repeated historical scans.

Should You Disable Deduplication to Save RAM?

Maybe, but only in a specific case.

If you have an external deduplication layer and you do not need JetStream’s publish deduplication, you can consider disabling or reducing the duplicate window per stream. This is most relevant when publishers set Nats-Msg-Id and the stream therefore maintains duplicate-tracking state.

If publishers are not using Nats-Msg-Id, JetStream is not storing duplicate IDs for those messages, so changing the duplicate window is unlikely to address high memory usage.

Before changing it, confirm the application-level behavior you need during retries. JetStream deduplication is often useful precisely when publishers retry after timeouts, reconnects, or uncertain publish acknowledgements.

When to Capture a Memory Profile

If a node is using much more RAM than its peers and the cause is not obvious, capture a memory profile while the usage is high.

The NATS documentation describes how to gather profiling data:

Profiling the NATS server

A profile is especially useful when the environment looks simple on the surface, for example:

only a few streams and consumers;
file-backed streams rather than memory streams;
one subject per stream;
long-lived consumers rather than repeated full scans;
memory concentrated on a particular leader.

In those cases, guessing from stream size alone can be misleading. An allocation or heap profile can show whether the memory is associated with deduplication, cache behavior, subject tracking, consumer state, or something else.

Practical Investigation Checklist

When you see unexpected JetStream RAM usage, collect the following before making tuning changes:

Node role: stream leaders, consumer leaders, and whether the node is the metadata leader.
Stream inventory: stream count, storage type, retention policy, message count, and configured limits.
Consumer inventory: consumer count, durability, acknowledgement policy, and whether consumers are long-lived or short-lived.
Subject cardinality: approximate number of unique subjects per large stream.
Deduplication usage: whether publishers set Nats-Msg-Id and each stream’s duplicate window.
Access pattern: current-only consumption versus historical scans or replay-style reads.
Runtime evidence: a memory profile captured while the process is using the unexpected amount of RAM.

This data helps distinguish expected cache or metadata use from a workload pattern or configuration that should be adjusted.

Key Takeaways

JetStream RAM usage can come from deduplication, file-store caching, per-subject tracking, stream and consumer state, and cluster leadership responsibilities. Disk-backed streams still use memory, and process memory may not immediately fall when clients stop.

If you are using Nats-Msg-Id, the duplicate window is a real tuning knob, but reducing it trades away server-side duplicate protection. If you are not using Nats-Msg-Id, deduplication state is probably not the source of high RAM usage.

For unusually high or asymmetric memory use, capture a NATS server memory profile while the issue is occurring. It is the fastest way to move from speculation to evidence.

Want help from the NATS experts? Meet with our architects to get help tailored to your use case and environment.