Checks/SERVER_016

NATS Account Resolution Failure: What It Means and How to Fix It

Severity
Warning
Category
Consistency
Applies to
Server
Check ID
SERVER_016
Detection threshold
ACCOUNT-type healthz error reported by server

A NATS server cannot resolve one or more accounts required for JetStream operations — the healthz endpoint is reporting an ACCOUNT-type error, indicating that the account’s configuration or JWT could not be loaded, leaving JetStream assets associated with that account in a failed state.

Why this matters

NATS multi-tenancy relies on accounts to isolate users, subjects, and JetStream resources. Each JetStream stream and consumer belongs to an account. When the server starts, it must resolve every account that owns JetStream assets so it can apply the correct limits, permissions, and resource reservations. If an account cannot be resolved, all streams and consumers belonging to that account fail to load.

This is not a partial failure — every JetStream asset in the unresolved account becomes unavailable on this server. If the account owns 50 streams, all 50 are down on this node. For replicated assets, the other replicas on servers that can resolve the account continue operating, but the cluster loses a replica for every affected stream. For R1 assets in the unresolved account, this is a complete outage.

Account resolution failures are particularly dangerous because they often affect many assets simultaneously. A single misconfigured JWT or an unreachable account resolver can take out an entire tenant’s JetStream infrastructure in one failure. Unlike stream-level issues where you troubleshoot one asset at a time, account resolution failures require fixing the account layer before any of its assets can recover.

Common causes

  • Account JWT expired or revoked. In deployments using the NATS JWT-based security model, account JWTs have expiration times. An expired JWT cannot be used to resolve the account. Similarly, if the account JWT was explicitly revoked via the operator, resolution fails.

  • Account resolver unreachable. When using a URL-based account resolver (pointing to an external NATS account server or resolver endpoint), network connectivity issues between the NATS server and the resolver prevent JWT fetching. DNS failures, firewall changes, or resolver downtime all cause this.

  • Missing or corrupt account JWT in the resolver directory. For file-based or directory-based resolvers, the JWT file for the account may be missing, empty, or contain invalid data. Manual cleanup of the resolver directory or filesystem corruption can cause this.

  • Operator or account key mismatch. The account JWT was signed by an operator key that the server does not trust. This happens when operator keys are rotated without updating the server’s trusted operator configuration, or when an account JWT from a different operator is accidentally placed in the resolver.

  • Memory resolver with stale configuration. The server’s accounts block in the configuration file references an account that has been removed or renamed. On restart, the server cannot find a matching account definition.

  • NATS account server version incompatibility. An upgraded NATS account server may return JWTs in a format that an older NATS server cannot parse, or vice versa.

How to diagnose

Check the healthz endpoint

Terminal window
nats server request healthz --name <server_name>

The response will identify the account that failed resolution:

1
status: error
2
error: "account resolution failed: ABCDEF123456..."

The identifier is the account’s public key (NKey). Map this to a human-readable account name using your operator configuration.

Check which accounts exist and their status

Terminal window
nats server report accounts

This lists all known accounts on the server. The failed account may be missing from this list entirely or may show zero connections and zero JetStream assets despite expecting them.

Verify the account JWT

If using JWT-based auth, check whether the account JWT is valid:

Terminal window
# Decode and verify the JWT
nsc describe account <account_name>
# Check expiration
nsc describe account <account_name> | grep -i "expires"

If the JWT is expired, it needs to be reissued and pushed to the resolver.

Check resolver connectivity

For URL-based resolvers, verify the server can reach the resolver endpoint:

Terminal window
# Check if the resolver URL is accessible from the server
curl -s http://<resolver_host>:<resolver_port>/jwt/v1/accounts/<account_public_key>

A 200 response with a valid JWT means the resolver is working. A connection timeout, 404, or 500 indicates the resolver is the problem.

Check server logs

Terminal window
# Server logs will detail the resolution failure
# Look for entries like:
# [ERR] Unable to resolve account "ABCDEF123456": fetch timeout
# [ERR] Account JWT verification failed: expired
# [ERR] Account resolution for JetStream assets failed

The log message distinguishes between network errors (timeout, connection refused), authentication errors (bad signature, expired), and missing account errors.

How to fix it

Immediate: identify the scope

List all JetStream assets in the affected account. Determine how many streams and consumers are impacted:

Terminal window
nats stream list # account is selected via NATS context/credentials
nats consumer list <stream_name> --account <account_name>

If these commands fail (because the account is unresolved), check other servers in the cluster where the account may still be resolved.

Short-term: restore account resolution

Reissue an expired JWT. If the account JWT has expired, generate a new one with a longer validity period:

Terminal window
# Reissue the account JWT
nsc generate creds --account <account_name>
# Push the updated JWT to the resolver
nsc push --account <account_name>

Restart the resolver or fix connectivity. If the URL resolver is unreachable, restart the NATS account server or fix the network path. Then trigger a reload on the NATS server:

Terminal window
nats-server --signal reload

A config reload causes the server to re-resolve accounts, picking up the now-available resolver.

Restore a missing JWT file. For file-based resolvers, place the correct JWT back in the resolver directory:

Terminal window
# Copy from another server or regenerate
nsc generate creds --account <account_name>
nsc push --account <account_name> --account-jwt-server-url nats://<server>
1
// Go - verify account is accessible
2
nc, _ := nats.Connect(url, nats.UserCredentials("user.creds"))
3
js, _ := nc.JetStream()
4
5
info, err := js.AccountInfo()
6
if err != nil {
7
log.Printf("Account JetStream access failed: %v", err)
8
// Account may not be resolved on this server
9
} else {
10
log.Printf("Account OK: %d streams, %d consumers",
11
info.Streams, info.Consumers)
12
}
1
// TypeScript (nats.js) - verify account access
2
import { connect } from "nats";
3
4
const nc = await connect({ servers: serverUrl, user: "user", pass: "pass" });
5
const jsm = await nc.jetstreamManager();
6
7
try {
8
const info = await jsm.getAccountInfo();
9
console.log(`Account OK: ${info.streams} streams`);
10
} catch (err) {
11
console.error(`Account resolution issue: ${err.message}`);
12
}
13
14
await nc.close();

Long-term: prevent recurrence

Use long-lived account JWTs with automated renewal. Set account JWT expiration to a reasonable duration (90+ days) and implement automated renewal well before expiration. Monitor JWT expiration dates as part of your operational checks.

Deploy redundant account resolvers. If using a URL resolver, run multiple instances behind a load balancer. A single-point-of-failure resolver means one outage takes out account resolution for the entire deployment:

1
# nats-server.conf - full resolver with caching
2
resolver: {
3
type: full
4
dir: "/data/jwt"
5
allow_delete: false
6
interval: "2m"
7
}

The full resolver type syncs and caches JWTs locally, so the server can still resolve accounts even if the upstream resolver is temporarily unavailable.

Implement JWT expiration monitoring. Build or configure monitoring that alerts when account JWTs are within 30 days of expiration:

Terminal window
# Check all account JWT expirations
nsc list accounts
nsc describe account <each_account> | grep "Expires"

Audit resolver directories after maintenance. After any filesystem maintenance, migration, or cleanup on servers running file-based resolvers, verify that all expected account JWTs are present and valid.

Pin operator keys and document rotation procedures. Key rotation is necessary for security but must be coordinated across all servers and account JWTs. Document the process and test it in staging before rotating production operator keys.

Frequently asked questions

Does this affect all users in the account or just JetStream?

JetStream assets (streams and consumers) belonging to the unresolved account are unavailable. Core NATS connections in the account may still work if the account was previously resolved and cached, but this behavior is not guaranteed. New connections authenticating to the unresolved account will fail.

Can I resolve the account without restarting the server?

Yes. A configuration reload (nats-server --signal reload or nats server config reload <server-id>) causes the server to re-resolve accounts. If the underlying issue (expired JWT, unreachable resolver) is fixed, the reload will resolve the account and recover its JetStream assets without a full server restart.

How do I find the account name from the NKey in the error?

The healthz error reports the account’s public NKey (starting with A). To map it to a name, use nsc list accounts which shows both the name and public key. Alternatively, check your operator’s account configuration or the JWT itself: nsc describe account --account-key <AKEY>.

What happens to messages published while the account is unresolved?

For replicated streams where other replicas are on servers that resolved the account successfully, publishes continue normally through those servers. For R1 streams or streams where all replicas are on servers with the resolution failure, publishes fail with a timeout or account error. Messages are not silently dropped — the publisher receives an error.

Is this different from SERVER_014 JetStream Subsystem Unhealthy?

Yes. SERVER_014 flags when the entire JetStream subsystem on a server is unhealthy (no meta leader, not current, or recovering). SERVER_016 is more specific — one or more individual accounts failed to resolve, but the JetStream subsystem itself may otherwise be functional. A server can have a healthy JetStream subsystem overall but still fail to resolve a specific account.

Proactive monitoring for NATS account resolution failure with Synadia Insights

With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.

Start a 14-day Insights trial
Cancel