NATS KV Buckets Without max_age: Unbounded Tombstone Growth

NATS Key-Value buckets are backed by JetStream streams. When you delete a key, NATS doesn’t remove the underlying message — it writes a tombstone marker. Without a max_age policy, these tombstones accumulate forever. Over weeks and months, the delete map grows into a silent problem: memory spikes during server restarts, slow replica catch-up, and degraded cluster recovery times.

Why this matters

Every KV delete operation writes a tombstone message to the backing stream. The server maintains an in-memory index of these tombstones — the “delete map” — to correctly handle key lookups and watch operations. Without max_age, this map only grows.

The cost isn’t obvious during normal operation. The server loads the delete map into memory and uses it efficiently for reads. The problem surfaces during restarts: the server must replay the entire stream to rebuild the delete map. A bucket with 10 million tombstones can add minutes to server startup and consume gigabytes of memory during reconstruction. In a cluster, this means a recovering node takes far longer to rejoin, extending the window where you’re running with reduced redundancy.

The cascading failure pattern is particularly dangerous. A node restarts, begins replaying the WAL to rebuild state, and the delete map reconstruction exhausts available memory. The node OOMs, restarts again, and hits the same wall. Meanwhile, the remaining cluster members are handling increased load. If a second node has the same issue, you lose quorum.

For KV buckets used as caches or session stores — where keys are frequently created and deleted — the tombstone accumulation rate can be surprisingly fast. A session store processing 1,000 deletions per minute accumulates 1.4 million tombstones per day. Within a week, you’re carrying nearly 10 million dead entries that serve no purpose.

Common causes

Default bucket creation without expiration. The NATS KV API creates buckets with no max_age by default. Developers creating a bucket with nats kv add or programmatically often skip this setting, not realizing the long-term implications for tombstone growth.
Cache-pattern usage without TTL. Applications using KV as a cache perform frequent put/delete cycles. Without max_age, the cache’s “evicted” entries persist as tombstones in the backing stream indefinitely, contradicting the ephemeral nature of cache data.
Session stores and temporary state. KV buckets storing user sessions, locks, or temporary coordination data see constant key churn. Each expired or released entry becomes a permanent tombstone without max_age.
Migrated data from other key-value stores. Teams moving from Redis or etcd to NATS KV may not realize that NATS doesn’t garbage-collect deletions the same way. In Redis, deleted keys are simply gone. In NATS KV, they leave tombstones.
Lack of visibility into interior deletes. The tombstone count isn’t surfaced in basic nats kv status output. Operators often don’t know the delete map is growing until a restart takes unexpectedly long or a node runs out of memory.

How to diagnose

Check for KV buckets without max_age

List all KV buckets and their configurations:

nats kv ls -v

Look for buckets where Max Age shows unlimited or is not set. These are the candidates.

For a specific bucket:

nats kv status MY_BUCKET

Measure tombstone accumulation

KV buckets are backed by streams prefixed with KV_. Check the interior delete count on the backing stream:

nats stream info KV_MY_BUCKET

Look at the Deleted Msgs or Num Deleted field. A high number relative to the current message count indicates significant tombstone accumulation.

For a cluster-wide view of streams with high interior deletes:

nats stream list --json | jq '.[] | select(.state.num_deleted > 10000) | {name: .config.name, deleted: .state.num_deleted, messages: .state.messages}'

Estimate memory impact

Each tombstone entry in the delete map consumes approximately 24–48 bytes of memory (sequence number + metadata). A bucket with 10 million tombstones uses 240–480 MB of memory just for the delete map during reconstruction.

# Quick estimate: check deleted count across all KV streams
nats stream list --json | jq '[.[] | select(.config.name | startswith("KV_")) | .state.num_deleted] | add'

Monitor restart times

If server restarts are taking longer than expected, check the server logs for stream recovery duration:

grep -i "recovered" /var/log/nats/nats-server.log | grep "KV_"

How to fix it

Set max_age on existing KV buckets

Update existing buckets to add a max_age that matches your data retention requirements:

# Set a 7-day expiration on a KV bucket
nats kv edit MY_BUCKET --ttl 7d

Choose a max_age based on the data pattern:

Session stores: Match your session TTL (e.g., 24h or 1h)
Caches: Match your cache invalidation window (e.g., 1h to 24h)
Configuration data: Longer retention is fine (e.g., 30d to 90d), but still set a limit
Coordination/locks: Short retention (e.g., 1h)

Set max_age at bucket creation time

Always specify max_age when creating KV buckets:

nats kv add MY_CACHE --ttl 24h --history 1

In Go:

1
js, _ := nc.JetStream()
2
kv, err := js.CreateKeyValue(&nats.KeyValueConfig{
3
    Bucket: "MY_CACHE",
4
    TTL:    24 * time.Hour,  // max_age for the bucket
5
    History: 1,
6
})

In Python:

1
import nats
2
from nats.js.api import KeyValueConfig
3
from datetime import timedelta
4

5
nc = await nats.connect()
6
js = nc.jetstream()
7

8
kv = await js.create_key_value(
9
    config=KeyValueConfig(
10
        bucket="MY_CACHE",
11
        max_age=timedelta(hours=24).total_seconds(),
12
        history=1,
13
    )
14
)

Purge existing tombstones

After setting max_age, existing tombstones older than the new limit will be cleaned up automatically during the next server-side compaction cycle. To force immediate cleanup:

# Purge deleted messages from the backing stream
nats stream purge KV_MY_BUCKET --keep 0 --subject '$KV.MY_BUCKET.>'

Caution: Only purge if you understand the implications. This removes all historical revisions beyond current values. For most KV use cases with history=1, this is safe.

Prevent future issues with standards

Establish organizational standards for KV bucket creation:

1
// Wrapper function that enforces max_age
2
func createKVBucket(js nats.JetStreamContext, name string, ttl time.Duration) (nats.KeyValue, error) {
3
    if ttl == 0 {
4
        return nil, fmt.Errorf("max_age (TTL) is required for all KV buckets")
5
    }
6
    return js.CreateKeyValue(&nats.KeyValueConfig{
7
        Bucket:  name,
8
        TTL:     ttl,
9
        History: 1,
10
    })
11
}

1
async def create_kv_bucket(js, name: str, max_age_seconds: float) -> None:
2
    """Create a KV bucket with mandatory max_age."""
3
    if max_age_seconds <= 0:
4
        raise ValueError("max_age is required for all KV buckets")
5
    await js.create_key_value(
6
        config=KeyValueConfig(
7
            bucket=name,
8
            max_age=max_age_seconds,
9
            history=1,
10
        )
11
    )

Frequently asked questions

Will setting max_age delete my current data?

Setting max_age applies to all messages in the backing stream, including current key values — not just tombstones. If a key’s value was written 30 days ago and you set max_age=7d, that key will be expired. Plan your max_age to be longer than your longest-lived key’s expected lifetime, or re-write current values after setting the policy.

What’s the right max_age for a KV bucket?

It depends on your access pattern. For caches and session stores, match the TTL to your expected key lifetime (hours to a day). For configuration or reference data, 30–90 days is reasonable — long enough to retain data, short enough to bound tombstone growth. The goal is preventing unbounded accumulation, so even a generous max_age of 90 days is dramatically better than no limit.

How does max_age interact with KV history?

max_age applies to all revisions in the history. If you have history=5 and max_age=24h, you’ll retain up to 5 revisions per key but only if they’re less than 24 hours old. For most use cases, history=1 combined with a reasonable max_age is the recommended configuration.

Can tombstone accumulation cause data loss?

Not directly, but indirectly yes. If the delete map causes memory exhaustion during restart, the server may fail to recover. In a replicated setup, if enough nodes can’t restart, you lose quorum and the stream becomes unavailable. Setting max_age bounds the delete map size and keeps recovery predictable.

Does Insights check this automatically?

Yes. Synadia Insights flags KV buckets (streams with the KV_ prefix) that have no max_age configured and have accumulated a significant number of interior deletes. The check triggers before the tombstone count reaches levels that would impact restart performance, giving you time to set appropriate expiration policies.

FEATURED

RESOURCES

Comparisons