A common community question is how to expose NATS gateways from Kubernetes or OpenShift when building a supercluster across regions, especially when JetStream readiness reports errors such as:
1Healthcheck failed: "JetStream has not established contact with a meta leader"The short version: keep local cluster formation independent from your external load balancers, and use load balancers only where Kubernetes networking requires them for cross-region gateway reachability.
A NATS supercluster is made from multiple NATS clusters connected with gateways. Within a single Kubernetes cluster or region, the NATS servers should still form their local NATS cluster using the normal cluster routes.
Do not put an external load balancer in the critical path for the route connections between pods in the same local NATS cluster. In Kubernetes, local clustering is typically handled with internal service discovery, such as the service patterns used by the NATS Helm chart.
Use the gateway port for traffic between NATS clusters in different regions or networks.
A practical target architecture is:
For Kubernetes and OpenShift deployments, an external LoadBalancer service is often a convenient way to make the gateway port reachable from another region.
In most cases, start with one load balancer in front of the NATS servers in a region for gateway traffic, rather than one load balancer per pod. A per-pod load balancer design can work in some environments, but it is usually more expensive and operationally more complex, and it can make bootstrap and health-check behavior harder to reason about.
NATS itself does not require load balancers for gateways. The load balancer is a Kubernetes or cloud networking mechanism that provides an externally reachable address.
If a NATS server is behind a load balancer, remote clusters need an address they can actually reach. That usually means configuring the gateway advertise address to the external DNS name and port exposed by the load balancer.
A simplified gateway configuration looks like this:
1gateway {2 name: us-east3 port: 72224 advertise: nats-gw-us-east.example.com:72225
6 gateways: [7 {8 name: eu-west9 urls: [nats://nats-gw-eu-west.example.com:7222]10 }11 ]12}Adapt this to your actual deployment, including TLS, authentication, account configuration, and the names of your regions. The important point is that the advertise value and the urls used for remote gateways must be reachable from the other NATS clusters.
Whether a cold restart of one region can stall on cross-region connectivity depends on how JetStream is partitioned across the supercluster.
By default, every JetStream-enabled server in a supercluster shares a single JetStream domain and forms one JetStream metadata group across all clusters. In that model, metadata leadership and recovery depend on gateway connectivity between regions. If a region restarts and cannot reach the other regions over gateways, its servers can keep reporting that JetStream has not established contact with a meta leader.
Alternatively, you can give each region its own JetStream domain. Each region then runs an independent metadata group that elects a leader using only its local servers, so a regional restart does not depend on cross-region gateway health. The tradeoff is that JetStream assets no longer span regions transparently: a stream lives in a single domain, and moving or replicating data between domains requires explicit cross-domain configuration, such as sourcing or mirroring through a domain-qualified JetStream API.
Neither option is automatically correct:
This is an architectural decision rather than a load balancer setting. The rest of this post applies either way, but the single-domain model is where readiness and health-check mistakes most easily turn into the deadlock described next.
The error:
1JetStream has not established contact with a meta leadermeans the server has not yet established contact with the JetStream metadata leader. During startup, that may be temporary. But in Kubernetes, a readiness or load balancer health check can accidentally turn that temporary state into a deadlock.
A common failure mode looks like this:
If disabling readiness makes the cluster start, that is a strong signal that readiness gating or load balancer health checks are involved. It does not mean readiness should simply be removed in production. It means the health checks need to match the bootstrap behavior you want.
The NATS monitoring port is commonly exposed on 8222, and health checks often use the /healthz endpoint.
For example, deployments may use a path like:
1/healthz?js-server-only=trueThe js-server-only=true form is intended to report on the JetStream subsystem of the individual server rather than on metadata leadership. With that parameter, a server can report healthy even when it has not yet established contact with a JetStream metadata leader; this behavior was confirmed for NATS Server 2.11.4. A plain /healthz with no parameters, by contrast, does check JetStream metadata health, so it reports unhealthy while there is no meta leader. If your probe is reporting the meta leader error, that is a hint the probe may not be using the parameters you expect.
Because the exact semantics of /healthz parameters can change between NATS Server versions, verify the behavior for the version you run. Also verify what is actually probed: Kubernetes service annotations, cloud provider load balancer behavior, Helm chart settings, and OpenShift configuration can all change the effective probe.
When troubleshooting, confirm all of the following:
8222.Do not assume that adding annotations or service options changed the cloud load balancer behavior. Check the effective load balancer configuration in the cloud or OpenShift control plane.
Before debugging the supercluster, confirm that each regional cluster starts cleanly without gateways enabled.
A useful sequence is:
A rolling update can hide bootstrap dependencies because at least part of the system remains available while each pod restarts. A full regional restart is a better test of whether readiness checks, load balancer health checks, and gateway advertise settings are safe during cold start.
When a Kubernetes-hosted NATS supercluster hangs with JetStream metadata readiness errors, check these items first:
gateway.advertise points to an address reachable by other regions.For NATS superclusters on Kubernetes, keep the local cluster simple and healthy first. Use Kubernetes load balancers to expose gateway traffic between regions, usually with a single regional gateway endpoint. Make sure gateway.advertise uses a reachable external address, and verify that readiness and load balancer health checks do not create a circular dependency on JetStream metadata leadership during startup. Decide deliberately whether JetStream should span all regions as one domain or run as an independent domain per region, because that choice determines whether cross-region gateway health affects local JetStream startup.
Want help from the NATS experts? Meet with our architects to get help tailored to your use case and environment.



News and content from across the community