Checks/ACCOUNTS_001

NATS Account Connection Limit: What It Means and How to Fix It

Severity
Warning
Category
Saturation
Applies to
Account
Check ID
ACCOUNTS_001
Detection threshold
account connections >= 90% of configured connection limit

Account Connection Limit means an account’s active connection count has reached or exceeded 90% of its configured maximum. When the limit is hit, every new connection attempt for that account is rejected — existing connections remain active, but no new clients can connect until connections are freed.

Why this matters

NATS accounts are the primary isolation boundary in multi-tenant deployments. Each account can have a configured connection limit that caps how many simultaneous client connections it can maintain. This limit exists to prevent one account from monopolizing server resources at the expense of others.

When an account approaches its connection limit, the operational risk is immediate. A service that scales horizontally by adding instances — a Kubernetes deployment scaling up, a new batch of worker processes starting — will fail to connect once the limit is reached. The scaling operation that was supposed to handle increased load instead produces connection errors, leaving the workload understaffed at exactly the moment it needs more capacity.

The problem compounds with connection churn. If clients are connecting and disconnecting rapidly (due to crashes, network instability, or misconfigured reconnect logic), each reconnection attempt consumes a connection slot. With 90% of connections used, even normal churn can cause transient limit violations where clients intermittently fail to connect. These failures appear as flaky connectivity — the hardest kind to debug because they’re not consistently reproducible.

Common causes

  • Horizontal scaling without limit adjustment. The account’s connection limit was set when the workload had 10 service instances. The team scaled to 50 instances without updating the limit. Each instance opens one or more NATS connections, and the aggregate exceeds the allocation.

  • Connection pooling not used. Each microservice instance opens multiple NATS connections — one for publishing, one for subscribing, one for JetStream — when a single connection can handle all three. Multiplying connections per instance by instance count quickly exhausts the limit.

  • Leaked connections. Application code creates NATS connections but doesn’t close them properly on shutdown. Over time, orphaned connections accumulate. Container orchestrators that kill pods without graceful shutdown exacerbate this — the TCP connection persists until the server’s stale connection timeout fires (typically 2 minutes).

  • Connection churn from unstable clients. Clients connecting and disconnecting rapidly (crash loops, authentication failures, network instability) consume connection slots during the reconnection window. Even though each individual connection is short-lived, the aggregate concurrent count during churn spikes can approach the limit.

  • Limit set too low for the workload. The connection limit was inherited from a template or set conservatively, and nobody revisited it as the workload grew. In JWT-based accounts, the limit is embedded in the account JWT and requires re-issuing the JWT to change.

  • WebSocket or MQTT connections. MQTT clients connecting through NATS’s MQTT adapter each consume a NATS connection. WebSocket clients similarly each use a connection. Workloads with many lightweight IoT or browser clients can exhaust connection limits faster than traditional service-to-service patterns.

How to diagnose

Check account connection usage

Terminal window
nats server report accounts

This shows per-account connection counts. Compare with the configured limit to identify accounts approaching capacity.

Inspect per-account details

Terminal window
nats account info

This shows the account’s configured limits and current usage, including connections, subscriptions, and JetStream quotas.

Identify the largest connection consumers within the account

Terminal window
nats server report connections --account <account_name> --sort in-msgs

This lists all connections for the account, sortable by activity. Look for:

  • Users with disproportionately many connections
  • Client names that appear many times (multiple instances of the same service)
  • Connections with zero messages (potentially leaked)

Check for connection leaks

Terminal window
nats server report connections --account <account_name> --sort idle

Connections idle for extended periods (>5 minutes with zero messages) may be leaked. Cross-reference with OPT_IDLE_007 (Idle Client Connections) for automated detection.

Check connection churn

Terminal window
# Look at connection count over time via the monitoring endpoint
curl http://localhost:8222/accstatz | jq '.account_statz[] | select(.account == "<account_name>") | {conns, total_conns}'

If total_conns (cumulative) is growing much faster than conns (current), clients are connecting and disconnecting rapidly.

How to fix it

Immediate: free connection slots

Identify and close leaked connections. Connections with zero messages that have been idle for minutes are likely leaks:

Terminal window
# List idle connections
nats server report connections --account <account_name> --sort idle

If you confirm they’re leaks, the application should be fixed to close connections properly. In the interim, the server’s stale connection timeout will eventually reclaim them.

Close duplicate connections. If a single service has multiple connections when one would suffice, fix the application to reuse a single connection:

1
// One connection per application instance is sufficient
2
nc, err := nats.Connect(url,
3
nats.Name("order-processor-1"),
4
nats.MaxReconnects(-1),
5
)
6
// Use nc for publishing, subscribing, and JetStream
7
js, _ := nc.JetStream()
1
import nats
2
3
# One connection handles pub, sub, and JetStream
4
nc = await nats.connect(
5
servers=["nats://server:4222"],
6
name="order-processor-1",
7
max_reconnect_attempts=-1,
8
)
9
js = nc.jetstream()

Short-term: increase the limit

Update the account connection limit. In server configuration:

1
accounts {
2
PRODUCTION {
3
users: [{user: svc, password: secret}]
4
limits {
5
max_connections: 500 # increased from 200
6
}
7
}
8
}

Reload the server configuration:

Terminal window
nats-server --signal reload

For JWT-based accounts, re-issue the account JWT:

Terminal window
# Update the account connection limit
nsc edit account PRODUCTION --conns 500
# Push the updated JWT
nsc push -a PRODUCTION

Long-term: design for connection efficiency

Use one NATS connection per application instance. A single NATS connection supports unlimited concurrent subscriptions and publish operations. There is no performance benefit to opening multiple connections from the same process — NATS connections multiplex all traffic over a single TCP socket:

1
// Anti-pattern: multiple connections per instance
2
pubConn, _ := nats.Connect(url)
3
subConn, _ := nats.Connect(url)
4
jsConn, _ := nats.Connect(url)
5
6
// Correct: one connection for everything
7
nc, _ := nats.Connect(url)
8
js, _ := nc.JetStream()
9
sub, _ := nc.Subscribe("orders.>", handler)
10
nc.Publish("events.processed", data)

Implement graceful shutdown. Ensure applications close their NATS connections when stopping. This frees connection slots immediately instead of waiting for the stale connection timeout:

1
// Handle shutdown signals
2
sigCh := make(chan os.Signal, 1)
3
signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
4
5
go func() {
6
<-sigCh
7
nc.Drain() // gracefully drain subscriptions and close
8
}()
1
import signal
2
3
async def shutdown():
4
await nc.drain()
5
6
loop = asyncio.get_event_loop()
7
loop.add_signal_handler(signal.SIGTERM, lambda: asyncio.create_task(shutdown()))

Set per-user connection limits. If the account hosts multiple services, set per-user limits to prevent one service from consuming all the account’s connections:

Terminal window
# In JWT mode
nsc edit user order-processor --conns 50
nsc edit user inventory-service --conns 50

Monitor and alert before the limit. Set up alerting at 80% to give more lead time.

Synadia Insights evaluates account connection usage every epoch and fires this check at 90%, providing per-account attribution so you can identify exactly which accounts need attention.

Frequently asked questions

What happens when the account hits its connection limit?

New connection attempts for that account receive an authorization error — the server rejects the connection during the CONNECT handshake. Existing connections are not affected. The error message in the client is typically “maximum connections exceeded.” Clients with auto-reconnect enabled will retry, but they’ll continue to be rejected until connections drop below the limit.

Do WebSocket and MQTT connections count against the account limit?

Yes. Every client connection, regardless of protocol (NATS, WebSocket, MQTT), counts as one connection against the account’s limit. MQTT clients connecting through NATS’s MQTT adapter each consume one NATS connection. Plan account limits to include all protocol types.

Can I set different connection limits per server?

Account connection limits are global across the cluster, not per-server. If an account has a 100-connection limit, those 100 connections can be distributed across any servers in the cluster. The limit is enforced cluster-wide through account state synchronization. Use OPT_BALANCE_006 (Account Connection Concentration) to monitor whether connections are unevenly distributed across servers.

How does this interact with the server-level max_connections?

The server’s max_connections is a global cap across all accounts. Account-level connection limits subdivide that global cap. A connection is rejected if either limit is reached. For example, if the server allows 10,000 connections and the account allows 500, the account is limited to 500 even if the server has capacity. Conversely, if the server is at 10,000 total connections, new connections fail even for accounts below their individual limits.

Should I set connection limits on every account?

Yes, for production deployments. Without connection limits, a single misconfigured or misbehaving application can open thousands of connections, consuming server resources and potentially affecting other accounts. Connection limits are a fundamental resource isolation mechanism. Even generous limits (e.g., 10,000 per account) prevent runaway scenarios.

Proactive monitoring for NATS account connection limit with Synadia Insights

With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.

Start a 14-day Insights trial
Cancel