NATS Key-Value buckets are backed by JetStream streams. When you delete a key, NATS doesn’t remove the underlying message — it writes a tombstone marker. Without a max_age policy, these tombstones accumulate forever. Over weeks and months, the delete map grows into a silent problem: memory spikes during server restarts, slow replica catch-up, and degraded cluster recovery times.
Every KV delete operation writes a tombstone message to the backing stream. The server maintains an in-memory index of these tombstones — the “delete map” — to correctly handle key lookups and watch operations. Without max_age, this map only grows.
The cost isn’t obvious during normal operation. The server loads the delete map into memory and uses it efficiently for reads. The problem surfaces during restarts: the server must replay the entire stream to rebuild the delete map. A bucket with 10 million tombstones can add minutes to server startup and consume gigabytes of memory during reconstruction. In a cluster, this means a recovering node takes far longer to rejoin, extending the window where you’re running with reduced redundancy.
The cascading failure pattern is particularly dangerous. A node restarts, begins replaying the WAL to rebuild state, and the delete map reconstruction exhausts available memory. The node OOMs, restarts again, and hits the same wall. Meanwhile, the remaining cluster members are handling increased load. If a second node has the same issue, you lose quorum.
For KV buckets used as caches or session stores — where keys are frequently created and deleted — the tombstone accumulation rate can be surprisingly fast. A session store processing 1,000 deletions per minute accumulates 1.4 million tombstones per day. Within a week, you’re carrying nearly 10 million dead entries that serve no purpose.
Default bucket creation without expiration. The NATS KV API creates buckets with no max_age by default. Developers creating a bucket with nats kv add or programmatically often skip this setting, not realizing the long-term implications for tombstone growth.
Cache-pattern usage without TTL. Applications using KV as a cache perform frequent put/delete cycles. Without max_age, the cache’s “evicted” entries persist as tombstones in the backing stream indefinitely, contradicting the ephemeral nature of cache data.
Session stores and temporary state. KV buckets storing user sessions, locks, or temporary coordination data see constant key churn. Each expired or released entry becomes a permanent tombstone without max_age.
Migrated data from other key-value stores. Teams moving from Redis or etcd to NATS KV may not realize that NATS doesn’t garbage-collect deletions the same way. In Redis, deleted keys are simply gone. In NATS KV, they leave tombstones.
Lack of visibility into interior deletes. The tombstone count isn’t surfaced in basic nats kv status output. Operators often don’t know the delete map is growing until a restart takes unexpectedly long or a node runs out of memory.
List all KV buckets and their configurations:
nats kv ls -vLook for buckets where Max Age shows unlimited or is not set. These are the candidates.
For a specific bucket:
nats kv status MY_BUCKETKV buckets are backed by streams prefixed with KV_. Check the interior delete count on the backing stream:
nats stream info KV_MY_BUCKETLook at the Deleted Msgs or Num Deleted field. A high number relative to the current message count indicates significant tombstone accumulation.
For a cluster-wide view of streams with high interior deletes:
nats stream list --json | jq '.[] | select(.state.num_deleted > 10000) | {name: .config.name, deleted: .state.num_deleted, messages: .state.messages}'Each tombstone entry in the delete map consumes approximately 24–48 bytes of memory (sequence number + metadata). A bucket with 10 million tombstones uses 240–480 MB of memory just for the delete map during reconstruction.
# Quick estimate: check deleted count across all KV streamsnats stream list --json | jq '[.[] | select(.config.name | startswith("KV_")) | .state.num_deleted] | add'If server restarts are taking longer than expected, check the server logs for stream recovery duration:
grep -i "recovered" /var/log/nats/nats-server.log | grep "KV_"Update existing buckets to add a max_age that matches your data retention requirements:
# Set a 7-day expiration on a KV bucketnats kv edit MY_BUCKET --ttl 7dChoose a max_age based on the data pattern:
24h or 1h)1h to 24h)30d to 90d), but still set a limit1h)Always specify max_age when creating KV buckets:
nats kv add MY_CACHE --ttl 24h --history 1In Go:
1js, _ := nc.JetStream()2kv, err := js.CreateKeyValue(&nats.KeyValueConfig{3 Bucket: "MY_CACHE",4 TTL: 24 * time.Hour, // max_age for the bucket5 History: 1,6})In Python:
1import nats2from nats.js.api import KeyValueConfig3from datetime import timedelta4
5nc = await nats.connect()6js = nc.jetstream()7
8kv = await js.create_key_value(9 config=KeyValueConfig(10 bucket="MY_CACHE",11 max_age=timedelta(hours=24).total_seconds(),12 history=1,13 )14)After setting max_age, existing tombstones older than the new limit will be cleaned up automatically during the next server-side compaction cycle. To force immediate cleanup:
# Purge deleted messages from the backing streamnats stream purge KV_MY_BUCKET --keep 0 --subject '$KV.MY_BUCKET.>'Caution: Only purge if you understand the implications. This removes all historical revisions beyond current values. For most KV use cases with history=1, this is safe.
Establish organizational standards for KV bucket creation:
1// Wrapper function that enforces max_age2func createKVBucket(js nats.JetStreamContext, name string, ttl time.Duration) (nats.KeyValue, error) {3 if ttl == 0 {4 return nil, fmt.Errorf("max_age (TTL) is required for all KV buckets")5 }6 return js.CreateKeyValue(&nats.KeyValueConfig{7 Bucket: name,8 TTL: ttl,9 History: 1,10 })11}1async def create_kv_bucket(js, name: str, max_age_seconds: float) -> None:2 """Create a KV bucket with mandatory max_age."""3 if max_age_seconds <= 0:4 raise ValueError("max_age is required for all KV buckets")5 await js.create_key_value(6 config=KeyValueConfig(7 bucket=name,8 max_age=max_age_seconds,9 history=1,10 )11 )Setting max_age applies to all messages in the backing stream, including current key values — not just tombstones. If a key’s value was written 30 days ago and you set max_age=7d, that key will be expired. Plan your max_age to be longer than your longest-lived key’s expected lifetime, or re-write current values after setting the policy.
It depends on your access pattern. For caches and session stores, match the TTL to your expected key lifetime (hours to a day). For configuration or reference data, 30–90 days is reasonable — long enough to retain data, short enough to bound tombstone growth. The goal is preventing unbounded accumulation, so even a generous max_age of 90 days is dramatically better than no limit.
max_age applies to all revisions in the history. If you have history=5 and max_age=24h, you’ll retain up to 5 revisions per key but only if they’re less than 24 hours old. For most use cases, history=1 combined with a reasonable max_age is the recommended configuration.
Not directly, but indirectly yes. If the delete map causes memory exhaustion during restart, the server may fail to recover. In a replicated setup, if enough nodes can’t restart, you lose quorum and the stream becomes unavailable. Setting max_age bounds the delete map size and keeps recovery predictable.
Yes. Synadia Insights flags KV buckets (streams with the KV_ prefix) that have no max_age configured and have accumulated a significant number of interior deletes. The check triggers before the tombstone count reaches levels that would impact restart performance, giving you time to set appropriate expiration policies.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community