An R1 stream stores data on a single server with no replicas. In a multi-node NATS cluster, this means the stream is a single point of failure. If the hosting node goes down — planned maintenance, hardware failure, network partition — that stream is completely offline until the node recovers. No reads, no writes, no consumer delivery. For critical data, this is an availability gap that your cluster topology was designed to prevent.
The entire point of running a multi-node NATS cluster is fault tolerance. With R3 (three replicas), a stream survives the loss of any single node. The remaining two replicas maintain quorum, and reads and writes continue without interruption. R1 bypasses this safety net entirely.
The failure mode is binary and immediate. When an R3 stream loses a node, it degrades gracefully — one replica is down, but two maintain quorum and the stream remains fully operational. When an R1 stream loses its node, it goes from fully operational to completely unavailable in an instant. There’s no degraded state, no reduced throughput, just an outage.
During the outage window, publishers to the R1 stream receive errors. Consumers stop receiving messages. If the stream is a WorkQueue, processing halts. If it’s a KV bucket backing configuration data, dependent services lose access to their configuration. If it’s an event stream that other services depend on, the downstream cascade begins.
Recovery depends entirely on the failed node coming back. If the node suffered a disk failure, recovery requires restoring from backup (if one exists) or accepting data loss. Even for a clean restart — say, a rolling upgrade — the stream is offline for the full duration of the node’s restart cycle. In large clusters with many streams and consumers, restart can take minutes as the node replays WAL files and rebuilds state.
The risk is amplified by the fact that R1 streams are often created unintentionally. The default replica count in many NATS client libraries and CLI tools is 1. Developers creating streams in development (where R1 is fine) carry that configuration into production without adjusting the replica count.
Default stream configuration. nats stream add defaults to R1 if --replicas is not specified. Programmatic stream creation often uses the library default, which is also R1 in most SDKs.
Development configuration carried to production. Streams created and tested in a single-node dev environment are deployed to production multi-node clusters without updating the replica count.
Cost-conscious overuse of R1. Teams intentionally use R1 to reduce storage costs (R3 triples disk usage). This is a valid trade-off for ephemeral or reproducible data, but it’s often applied too broadly, including to streams carrying critical business data.
Ephemeral streams that became permanent. A stream was created as a temporary staging area or test stream with R1. Over time, it became load-bearing as other services started depending on it, but the replica count was never updated.
KV buckets defaulting to R1. NATS KV buckets are backed by streams, and nats kv add defaults to R1. KV buckets used for configuration, feature flags, or service discovery may carry critical data on a single replica.
# Check cluster size firstnats server list
# List streams with their replica countnats stream list --json | jq '.[] | select(.config.num_replicas == 1) | {name: .config.name, replicas: .config.num_replicas, subjects: .config.subjects}'nats stream info MY_STREAMLook at Replicas in the output. If it shows 1 and your cluster has 3+ nodes, this stream has no redundancy.
nats stream list --json | jq '.[] | select(.config.name | startswith("KV_")) | select(.config.num_replicas == 1) | .config.name'Not all R1 streams need to be upgraded. Evaluate each one:
# Check message rate and consumer countnats stream info MY_STREAM --json | jq '{ name: .config.name, messages: .state.messages, bytes: .state.bytes, consumers: .state.consumer_count, subjects: .config.subjects}'Streams with active consumers, high message counts, or subjects that other services depend on are candidates for R3 upgrade.
You can update the replica count on an existing stream without downtime:
nats stream edit MY_STREAM --replicas 3The server will begin replicating existing data to two additional nodes. During replication, the stream remains fully available. Monitor the replica catch-up:
nats stream info MY_STREAMWatch the Replicas section — new replicas will show as catching up until they’re fully synchronized.
Always specify replicas explicitly when creating streams in production:
nats stream add ORDERS --subjects "orders.>" --replicas 3 --retention limits --max-age 7dIn Go:
1js, _ := nc.JetStream()2_, err := js.AddStream(&nats.StreamConfig{3 Name: "ORDERS",4 Subjects: []string{"orders.>"},5 Replicas: 3,6 MaxAge: 7 * 24 * time.Hour,7})In Python:
1import nats2from nats.js.api import StreamConfig3
4nc = await nats.connect()5js = nc.jetstream()6
7await js.add_stream(8 StreamConfig(9 name="ORDERS",10 subjects=["orders.>"],11 num_replicas=3,12 max_age=7 * 24 * 3600, # 7 days in seconds13 )14)nats kv update MY_CONFIG --replicas 3R1 is a valid choice for specific use cases. Don’t blindly upgrade everything:
Document R1 decisions explicitly so future operators understand the trade-off:
nats stream edit TELEMETRY_RAW --description "R1 intentional: ephemeral telemetry, source is R3 TELEMETRY_PROCESSED"Create a CI check or operational script that flags R1 streams in production clusters:
#!/bin/bash# Flag R1 streams in clusters with 3+ nodesNODES=$(nats server list --json | jq length)if [ "$NODES" -ge 3 ]; then R1_STREAMS=$(nats stream list --json | jq '[.[] | select(.config.num_replicas == 1) | .config.name] | length') if [ "$R1_STREAMS" -gt 0 ]; then echo "WARNING: $R1_STREAMS R1 stream(s) in a $NODES-node cluster" nats stream list --json | jq '.[] | select(.config.num_replicas == 1) | .config.name' fifiYes. Increasing replicas from R1 to R3 is an online operation. The existing leader continues serving reads and writes while the new replicas catch up. Depending on stream size, catch-up may take seconds to hours. Monitor with nats stream info — the new replicas transition from catching up to current when synchronized.
Yes, R3 stores three copies of every message across three nodes. For disk-heavy workloads, this can be significant. Mitigate by setting appropriate max_age, max_bytes, or max_msgs limits on streams, and use compression (--compression s2) for large streams. The storage cost of R3 is the price of availability — evaluate it against the cost of the outage R1 exposes you to.
NATS supports R5 (five replicas) for deployments requiring survival of two simultaneous node failures. R5 is rarely needed — it increases storage and Raft overhead. R3 is the standard recommendation for production high availability. Use R5 only if your failure domain analysis specifically requires it.
Unacknowledged messages are unavailable until the node recovers. For WorkQueue retention streams, this means processing halts. For limits-retention streams, consumers can’t read any messages. Publishers receive errors (no responders or timeout). With R3, the remaining two replicas elect a new leader within seconds and processing continues.
Insights flags R1 streams specifically in multi-node clusters (3+ nodes) where redundancy is available but not being used. In single-node deployments, R1 is the only option and is not flagged. The check helps you identify streams that could benefit from the redundancy your cluster topology already provides.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community