High leaf RTT means the round-trip time between a leafnode server and its hub cluster exceeds the configured threshold (default: 100ms). Leafnodes bridge messages between edge locations and the central cluster. When the link is slow, every message crossing the leafnode boundary inherits that latency — request-reply calls time out, JetStream publish acknowledgments lag, and clients connected to the leaf experience degraded performance for any operation that touches the hub.
Leafnodes are designed for extending NATS to remote locations — branch offices, edge computing sites, IoT gateways, regional deployments. Some latency is expected and acceptable. But when RTT exceeds the threshold, the latency moves from “acceptable overhead” to “operational impact.”
The most immediate impact is on request-reply patterns. A service request from a leaf-connected client to a hub-connected responder requires at least two leafnode traversals: request out, reply back. At 100ms RTT, that’s a minimum 200ms added to every request — before the responder even processes it. If the client’s timeout is set to 1 second (a common default), a single RTT spike to 300ms consumes 600ms of the timeout budget just on network transit. Under load, these requests start timing out, and the application sees intermittent failures that correlate with nothing visible at the application layer.
JetStream operations through leafnodes are even more sensitive. A publish with acknowledgment (js.Publish) requires the message to travel from the leaf to the hub, be committed by the stream’s Raft group, and then send the ack back. At high RTT, publish throughput drops proportionally because each publish-ack cycle includes the leafnode round trip. Batch publishing helps, but the fundamental constraint is the speed-of-light delay plus any network overhead on the path.
Geographic distance. The leafnode is physically far from the hub cluster. A leafnode in Singapore connecting to a hub in US-East inherently has 200ms+ RTT. This is physics, not a bug — but the check flags it so you can make architectural decisions accordingly.
Network congestion on the leafnode link. The network path between the leaf and hub is saturated. This is common when the leafnode shares an internet connection with other traffic, or when the link is a low-bandwidth WAN connection. RTT increases as packets queue behind other traffic.
VPN or tunnel overhead. Many leafnode deployments use VPN tunnels (WireGuard, IPsec, OpenVPN) for security. Each layer of encapsulation adds processing time, and VPN concentrators can become bottlenecks under load. Double-encapsulation (e.g., NATS TLS inside a VPN tunnel) compounds the overhead.
ISP routing inefficiency. The network path between leaf and hub may not be the shortest route. ISP peering arrangements can route traffic through distant exchange points, adding tens of milliseconds to what should be a short path. This is especially common on consumer-grade internet connections.
Overloaded leafnode server. If the leafnode server itself is under CPU or memory pressure, it responds slowly to NATS PING/PONG measurements, inflating the reported RTT. The network may be fine, but the server can’t process the ping fast enough.
Query the hub server’s leafnode connections:
curl -s "http://localhost:8222/leafz" | jq '.leafs[] | {name: .name, rtt: .rtt, account: .account}'This shows the measured RTT for each leafnode connection. Compare against the threshold (default 100ms).
On the leafnode itself:
nats rttThis measures the round-trip time from the leafnode’s perspective. Compare this with the hub-side measurement. If they differ significantly, there may be asymmetric routing or measurement issues.
Use standard network tools to isolate whether the latency is in the network or the NATS server:
# Basic pingping -c 20 <hub_server_ip>
# Traceroute to identify where latency accumulatestraceroute <hub_server_ip>
# MTR for continuous path analysismtr --report <hub_server_ip>If ICMP ping shows similar latency to NATS RTT, the issue is network-level. If NATS RTT is significantly higher than ICMP ping, the NATS server is adding processing delay (check CPU/load on both sides).
A stable RTT (even if high) is less problematic than a highly variable one. Sustained high RTT is predictable; spiky RTT causes intermittent timeouts that are harder to diagnose:
# Monitor RTT over timewatch -n 5 'curl -s "http://localhost:8222/leafz" | jq ".leafs[] | {name: .name, rtt: .rtt}"'If the leafnode is under resource pressure, RTT measurements will be inflated:
nats server infoCheck CPU usage and connection count. A leafnode handling many client connections while running on limited hardware may not have capacity to respond to pings promptly.
Adjust client timeouts to account for leafnode RTT. Clients connected to a leafnode should have longer timeouts than clients connected directly to the hub:
1// Go — adjust timeouts for leaf-connected clients2nc, err := nats.Connect(leafURL,3 nats.Timeout(5*time.Second), // Connection timeout4 nats.PingInterval(30*time.Second), // Less aggressive ping5 nats.MaxPingsOutstanding(3), // More tolerance for missed pings6)7if err != nil {8 log.Fatal(err)9}10
11// For JetStream, increase publish ack timeout12js, _ := nc.JetStream(nats.PublishAsyncMaxPending(256))13_, err = js.Publish("events.order", data, nats.AckWait(5*time.Second))1# Python — leaf-aware timeouts2import nats3
4nc = await nats.connect(5 "nats://leaf-server:4222",6 connect_timeout=5,7 ping_interval=30,8 max_outstanding_pings=3,9)10
11js = nc.jetstream()12ack = await js.publish("events.order", data, timeout=5.0)Enable leafnode compression. If not already enabled, compression reduces the data volume on the leafnode link, which can reduce RTT by decreasing queuing delay on congested links:
1# Leafnode configuration2leafnodes {3 remotes [{4 url: "nats://hub:7422"5 compression: s2_auto6 }]7}Check VPN configuration. If using a VPN tunnel, ensure the MTU is set correctly to avoid fragmentation. Fragmented packets require reassembly, adding latency:
# Find the optimal MTUping -M do -s 1400 <hub_server_ip># Reduce size until pings succeed without fragmentationUse a dedicated network link for the leafnode if it shares bandwidth with other traffic. Quality-of-service (QoS) rules can prioritize NATS traffic on the link.
Use gateways instead of leafnodes for cross-region connectivity. If two locations both have full NATS clusters, gateways provide cluster-to-cluster connectivity with better throughput characteristics than leafnodes. Leafnodes are ideal for edge locations with a single server; gateways are better for region-to-region.
Deploy local JetStream streams at the leaf. Instead of routing all JetStream operations through the hub, create streams on the leafnode for data that’s produced and consumed locally. Use NATS subject mapping or mirrors/sources to replicate only the subset of data that needs to reach the hub:
# Create a local stream on the leafnode for edge datanats stream add local-events \ --subjects "events.local.>" \ --storage file \ --retention limits \ --max-age 24hImplement store-and-forward patterns. For data flows that must cross the leafnode boundary, publish to a local stream and use a source or mirror to replicate to the hub asynchronously. This decouples the client’s publish latency from the leafnode RTT:
# On the hub, create a stream that sources from the leaf's streamnats stream add hub-events \ --subjects "events.local.>" \ --source local-eventsIt depends on geography and network topology. Same-datacenter leafnodes should be under 5ms. Same-region (e.g., US-East to US-East) should be under 20ms. Cross-continent (US to Europe) is typically 80-120ms. The default threshold of 100ms is set to flag connections where latency is high enough to impact request-reply patterns and JetStream operations. Adjust the threshold based on your expected deployment topology.
Indirectly. High RTT means the leafnode’s TCP connection to the hub drains more slowly. If the hub is sending high-volume traffic to subjects that cross the leafnode boundary, the hub-side buffer for the leafnode connection fills faster. In extreme cases, the leafnode connection itself can be flagged as a slow consumer and disconnected. This is rare but possible at very high message rates combined with high RTT.
Leafnodes are designed for extending a single logical NATS system to remote locations, typically where you have one server at the edge. They’re lightweight and simple to configure. Gateways connect separate NATS clusters with independent identity, providing full cluster-to-cluster routing. If the remote location has a full cluster (3+ servers), gateways are typically better. If it’s a single server at an edge location, leafnodes are the right choice. High RTT affects both, but gateways handle it more gracefully because they have their own local cluster for client operations.
Yes. The threshold is configurable in Synadia Insights. If your architecture expects cross-region leafnodes with inherent latency, increase the threshold so the check doesn’t fire for expected conditions. If all your leafnodes are same-region and you want tighter monitoring, decrease the threshold. The goal is to alert on unexpected latency, not on expected geographic distance.
With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.
News and content from across the community