A low route count means a NATS server has fewer active cluster route connections than expected for the cluster size. In a fully meshed N-node cluster, each server should have exactly N-1 routes. A missing route means at least one cluster peer is unreachable, breaking the full mesh and potentially isolating clients, fragmenting subscription interest, and disrupting JetStream Raft consensus.
NATS clusters form a full mesh topology — every server maintains a direct route connection to every other server in the cluster. These routes carry subscription interest propagation, message forwarding between servers, and JetStream Raft replication traffic. When a route is missing, the affected servers cannot communicate directly.
The impact depends on which route is missing and what traffic flows through it. At minimum, clients connected to one server cannot reach subscribers connected to the disconnected peer — messages published on one side don’t reach subscriptions on the other. Subscription interest propagation stops, so new subscriptions created on one server aren’t visible to the disconnected peer. For core NATS, this means silent message loss: publishers succeed (NATS is fire-and-forget), but the messages never reach subscribers on the unreachable server.
For JetStream, the consequences are more severe. Raft groups require a majority of members to communicate for leader election and log replication. In a 3-node cluster, losing one route connection can prevent Raft groups from achieving quorum if the disconnected server holds a replica. Streams may lose their leader, become read-only, or stall entirely. The meta cluster — which manages all JetStream asset metadata — is also a Raft group and is equally affected. Without meta quorum, no new streams or consumers can be created cluster-wide.
A low route count is often the first signal of a network partition, a crashed server, or a misconfigured firewall rule. Detecting it quickly — before operators notice the downstream symptoms of message loss or JetStream stalls — is critical for maintaining cluster health.
Network partition between cluster peers. A switch failure, VLAN misconfiguration, or routing change isolates one or more servers from their peers. Route connections drop, but both sides remain running and serving their local clients — unaware that the cluster is fragmented.
Firewall blocking the route port. The default route port is 6222. A firewall rule change, security group update, or iptables modification that blocks this port prevents route connections from establishing. This commonly happens after infrastructure changes that don’t account for NATS’s separate ports for client (4222), route (6222), and gateway (7222) traffic.
Server crash or process failure. If a cluster peer crashes, its route connections drop. The remaining servers see a reduced route count until the crashed server restarts. This is often accompanied by SERVER_008 (Server Restarted) or SERVER_001 (Server Health) alerts.
DNS resolution failure. If route URLs use hostnames and DNS resolution fails for one peer, the server cannot establish or re-establish the route. This is common in Kubernetes environments where pod DNS depends on CoreDNS availability.
Missing or incorrect route URL in configuration. A server’s configuration is missing the route URL for one or more peers. This can happen when a new server is added to the cluster but not all existing servers are updated to include the new peer’s route URL in their config.
TLS handshake failure. If routes are configured with TLS and certificates have expired, been rotated inconsistently, or have mismatched CA chains, the TLS handshake fails and the route cannot be established. The server logs will show TLS errors, but the visible symptom is a missing route.
nats server listLook at the Routes column. In an N-node cluster, every server should show N-1 routes. Any server showing fewer routes has a connectivity problem.
Query the route details from the server with the low count:
curl -s http://<server-host>:8222/routez | jq '.routes[] | {remote_id, ip, port}'Compare the list of connected route peers against the expected cluster membership. The missing entry identifies which peer is unreachable.
Server logs will show route connection failures with details:
# Search for route-related errorsjournalctl -u nats-server --since "1 hour ago" | grep -i "route"Common log patterns:
Error trying to connect to route — active connection attempt failingRoute connection closed — an established route droppedTLS handshake error — certificate issue on the route connection# Test route port connectivity from one server to anothernc -zv <peer-host> 6222
# Check if the route port is listening on the target serverss -tlnp | grep 6222
# Check firewall rules for the cluster portiptables -L -n | grep 6222
# Verify DNS resolution for route hostnamesdig <peer-hostname>If the port check fails, the issue is network-level (firewall, routing, DNS resolution, or the server isn’t listening). Expected route count is N-1 for a full-mesh cluster of N servers.
If routes are connected but unstable, check the latency:
curl -s http://<server-host>:8222/routez | jq '.routes[] | {remote_id, rtt}'High RTT on a route connection can cause timeouts and intermittent disconnections.
If a server has crashed, restart it:
systemctl restart nats-serverIf the issue is a firewall rule, open the route port:
# Example: allow route port in iptablesiptables -A INPUT -p tcp --dport 6222 -j ACCEPT
# Or in cloud security groups, ensure port 6222 is open# between all cluster member IPsIf DNS is failing, verify resolution and consider using IP addresses as a temporary workaround:
1cluster {2 name: "C1"3 routes = [4 "nats-route://10.0.1.10:6222"5 "nats-route://10.0.1.11:6222"6 "nats-route://10.0.1.12:6222"7 ]8}Ensure all servers have matching cluster names and complete route configuration listing every cluster peer. A full mesh requires each server to list at least one other server’s route URL (gossip handles the rest, but listing all is recommended for resilience):
1cluster {2 name: "C1"3 listen: "0.0.0.0:6222"4 routes = [5 "nats-route://s1.example.com:6222"6 "nats-route://s2.example.com:6222"7 "nats-route://s3.example.com:6222"8 ]9}It’s safe (and recommended) for a server to include its own address in the routes list — it will simply skip connecting to itself.
If TLS certificates have expired, rotate them and reload:
# After updating certificatesnats-server --signal reload=<pid>TLS configuration changes on routes take effect on reload without a restart — new route connections will use the updated certificates.
Use configuration management for route URLs. In dynamic environments (Kubernetes, auto-scaling groups), use DNS-based route discovery or configuration management to ensure route URLs stay current:
1# Kubernetes: NATS Helm chart handles route discovery automatically2# For manual deployments, use a shared config template3cluster {4 name: "C1"5 routes = [6 {% for server in nats_servers %}7 "nats-route://{{ server.hostname }}:6222"8 {% endfor %}9 ]10}Monitor route counts continuously. Export the route count from /routez to your monitoring stack.
Set up certificate rotation automation. If routes use TLS, automate certificate renewal with cert-manager (Kubernetes) or certbot, and configure NATS to reload on certificate changes. Expired certificates are a preventable cause of route failures.
Test route connectivity in CI/CD. Before deploying configuration changes that affect networking (firewall rules, security groups, route URLs), validate that all cluster members can reach each other on the route port.
NATS servers detect a dropped route connection almost immediately through TCP keepalives and the internal ping/pong mechanism. Once detected, the server begins attempting to re-establish the route. Reconnection attempts use exponential backoff, so a transiently unavailable peer will reconnect within seconds. If the peer is down or unreachable, the server continues retrying indefinitely.
For core NATS, the cluster functions in a degraded state — servers that can still reach each other continue forwarding messages, but clients on the disconnected server are isolated. For JetStream, it depends on the cluster size: in a 3-node cluster, losing one route may break Raft quorum for groups that include the disconnected server. In a 5-node cluster, losing one server still leaves a majority for quorum.
If you list all route URLs explicitly, yes — every existing server needs the new server’s route URL added. However, NATS supports route gossip: once a new server connects to any existing server, the existing servers learn about the new peer and establish routes automatically. You only need one existing server’s URL in the new server’s config. That said, listing all URLs is the best practice for resilience — it ensures the new server can join even if its initial contact server is down.
Routes connect servers within the same cluster (intra-cluster). Gateways connect servers in different clusters (inter-cluster). A missing route (CLUSTER_005) means a cluster peer is unreachable. A gateway disconnection (CLUSTER_007) means an entire remote cluster is unreachable. Both are connectivity issues but at different scopes, and they use different ports (route: 6222, gateway: 7222 by default).
No. NATS requires a full mesh between all cluster members — every server must have a route to every other server. There is no partial mesh or hub-and-spoke topology for cluster routes. If you need to connect servers across regions without full mesh, use gateways (for cluster-to-cluster) or leafnodes (for leaf-to-hub).
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community