NATS Excessive User Connections: What It Means and How to Fix It

Excessive user connections means a single NATS user identity has more than 100 active connections. In most architectures, one connection per process is sufficient — a single user with hundreds of connections typically indicates a connection leak, misconfigured reconnection logic, or a deployment pattern that creates unnecessary connections.

Why this matters

Every NATS connection consumes server resources: memory for the connection state, CPU for subscription matching, and file descriptors at the OS level. A single user holding 100+ connections is using resources that could serve 100+ distinct clients. At scale, a few users with connection leaks can exhaust the server’s max_connections limit, blocking legitimate new connections from other users and services.

The problem is often invisible until it causes an outage. Connection leaks tend to be gradual — a process that opens a new connection on every request without closing the old one accumulates connections slowly. At 10 requests per minute, it takes just under two hours to reach 1,000 connections. By the time someone notices the server rejecting new connections, the leaking process has been running for hours and the connection table is full.

Excessive connections from a single user also complicate debugging and monitoring. If one user credential is shared across hundreds of connections, per-user metrics (throughput, subscription interest, pending bytes) aggregate all those connections into a single identity. You can’t distinguish the healthy connections from the problematic ones without additional metadata like client name or IP address. This is why the NATS best practice is one connection per process with a descriptive client name.

Common causes

Connection leak in application code. The application creates a new NATS connection for each request, batch job, or goroutine without closing the previous one. The old connections remain open on the server until they time out or the process exits. This is the most common cause.
Reconnect loop creating duplicate connections. A client with aggressive reconnect logic creates a new connection before the server has cleaned up the old one. Each reconnect cycle adds a connection, and the old connection lingers until the server’s ping/pong timeout evicts it.
Shared credentials across many processes. A single user credential (token, NKey, or JWT) is used by every instance of a service. When the service scales to many replicas — Kubernetes pods, Lambda functions, container instances — each replica opens its own connection under the same user identity.
Multiple connections per process. Some applications create separate NATS connections for different subsystems: one for publishing, one for subscribing, one for request-reply. While occasionally justified, this pattern multiplies the connection count per process by the number of subsystems.
Microservice scaling without per-instance credentials. Horizontal scaling with a single shared credential means the user connection count grows linearly with replica count. At 50 replicas with 2 connections each, a single user has 100 connections.

How to diagnose

Identify users with high connection counts

Query the server’s connection endpoint with authentication details:

curl -s http://localhost:8222/connz?auth=true&limit=500 | \
  jq '[.connections[] | .authorized_user] | group_by(.) | map({user: .[0], count: length}) | sort_by(-.count) | .[:10]'

This groups connections by user and shows the top 10 by connection count.

Examine connections for a specific user

Once you’ve identified the high-connection user, drill into their connections:

curl -s http://localhost:8222/connz?auth=true&user=<username>&limit=100 | \
  jq '.connections[] | {cid, name, ip, start, idle, pending_bytes, in_msgs, out_msgs}'

Look for patterns:

Same IP, many connections — connection leak in a single process
Same name pattern, many IPs — scaled service with shared credentials
High idle time, zero messages — leaked connections that are no longer in use
Recent start times clustering — reconnect storm

Check for idle connections

Connections that have been idle with zero lifetime messages are strong indicators of leaks:

curl -s http://localhost:8222/connz?auth=true&limit=500 | \
  jq '.connections[] | select(.in_msgs == 0 and .out_msgs == 0 and .idle != "") | {cid, name, ip, idle, start}'

Check account-level connection limits

nats server report accounts

If the user is approaching the account’s connection limit, the impact extends beyond this single user to all users in the account.

How to fix it

Immediate: identify and close leaked connections

Kick idle, zero-activity connections. If you’ve identified connections that are clearly leaked (zero messages, idle for extended periods), close them via the system account:

# Close a specific connection by CID (requires system account access)
nats server request kick <connection_id>

The /connz HTTP endpoint is read-only — connections cannot be terminated via HTTP. Use nats server request kick (or publish to $SYS.REQ.SERVER.<id>.KICK) instead.

Set account-level connection limits. NATS does not support per-user connection caps, but per-account limits constrain the total connections any user (and the account as a whole) can hold. This bounds the blast radius of a leaky service without changing per-user JWTs:

# JWT/nsc — limit total connections across all users in the account
nsc edit account Production --conns 100

For server-config-based auth, account-level limits are configured in the accounts {} block (see ACCOUNTS_001).

Short-term: fix the connection lifecycle

Ensure one connection per process, reused for the process lifetime. The NATS client connection should be created once at startup and shared across all goroutines, threads, or handlers:

1
// Go — singleton connection pattern
2
package main
3

4
import (
5
    "log"
6
    "github.com/nats-io/nats.go"
7
)
8

9
var nc *nats.Conn
10

11
func main() {
12
    var err error
13
    nc, err = nats.Connect("nats://localhost:4222",
14
        nats.Name("order-service"),
15
        nats.MaxReconnects(-1),
16
    )
17
    if err != nil {
18
        log.Fatal(err)
19
    }
20
    defer nc.Close()
21

22
    // Use nc throughout the application
23
    startHTTPServer(nc)
24
}

1
# Python (nats.py) — single connection, reused
2
import nats
3

4
class App:
5
    def __init__(self):
6
        self.nc = None
7

8
    async def start(self):
9
        self.nc = await nats.connect(
10
            servers=["nats://localhost:4222"],
11
            name="order-service",
12
            max_reconnect_attempts=-1,
13
        )
14
        # Use self.nc throughout the application
15

16
    async def stop(self):
17
        if self.nc:
18
            await self.nc.close()

Fix reconnect logic to avoid duplicates. Ensure the client library’s built-in reconnect handles connection lifecycle correctly. Don’t wrap the connect call in a retry loop that creates new connection objects — use the library’s reconnect options instead:

1
// Wrong — creates duplicate connections
2
for {
3
    nc, err := nats.Connect(url)
4
    if err != nil {
5
        time.Sleep(time.Second)
6
        continue
7
    }
8
    break
9
}
10

11
// Right — built-in reconnect handles it
12
nc, err := nats.Connect(url,
13
    nats.MaxReconnects(-1),
14
    nats.ReconnectWait(2*time.Second),
15
)

Long-term: per-instance credentials and monitoring

Issue per-instance user credentials. Instead of sharing one credential across all replicas, generate unique user credentials per deployment instance. This gives you per-instance visibility in connz output, lets you set per-user subscription/payload limits, and makes it easy to revoke a single instance’s credentials in isolation:

# Create per-instance users
nsc add user --name order-service-pod-1 --account Production
nsc add user --name order-service-pod-2 --account Production

Per-user JWT limits do not include a connection cap (NATS only exposes connection limits at the account level), so use per-account --conns to bound aggregate connection usage.

Monitor per-user connection counts. Set up alerting on connection counts per user to catch leaks early.

Synadia Insights tracks per-user connection counts automatically, flagging users that exceed the threshold across all servers in your deployment — catching connection leaks and misconfigurations before they exhaust server resources.

Frequently asked questions

How many connections per user is normal?

One connection per process is the NATS best practice. If a service runs 10 replicas with a shared credential, 10 connections is expected. The default threshold of 100 is intentionally conservative — it catches runaway leaks without flagging normal scaled deployments. Adjust the threshold based on your architecture: if a service legitimately runs 200 replicas, set the threshold accordingly for that user.

Do idle connections consume significant server resources?

Each connection consumes approximately 10–20 KB of server memory for the connection state, plus memory for any subscription interest. A thousand idle connections use roughly 10–20 MB — not catastrophic on its own, but they consume file descriptors and count against max_connections. The real risk is reaching the connection limit and blocking new legitimate connections, not the memory footprint.

Can I limit connections per user?

No. NATS does not expose a per-user connection cap — neither the server-config user block nor the JWT user limits include a connection field (only account JWTs do, via nsc edit account --conns). To bound a single credential’s blast radius, either issue per-instance credentials (so each replica has its own user) or rely on the account-level --conns limit covering all users in that account.

How do I limit connections without restarting the server?

Set or update the account-level --conns limit. With JWT-based auth (nsc), nsc edit account <name> --conns N updates the account JWT and the server picks up the change via the account resolver — no restart required. For server-config-based auth, update the accounts {} block and run nats-server --signal reload. Existing excess connections aren’t terminated retroactively; new connections beyond the limit will be rejected.

What’s the difference between per-user and per-account connection limits?

Per-account limits (ACCOUNTS_001) cap the total connections across all users in an account — that’s the only knob NATS exposes. There is no per-user connection cap. To get fine-grained control, issue per-instance user credentials inside an account so the account-level limit acts as a ceiling on aggregate connections, and use per-user subscription/payload limits to constrain other resources.

Should I use connection pooling with NATS?

No. Unlike database connections, NATS connections are fully multiplexed — a single connection supports unlimited concurrent subscriptions, publishes, and request-reply operations. There’s no performance benefit to multiple connections, and pooling adds complexity. The only valid reason for multiple connections per process is isolation (e.g., separating a high-throughput data plane connection from a low-volume control plane connection), and even that is rarely necessary.

FEATURED

RESOURCES

Comparisons