NATS Account Slow Consumers: What They Mean and How to Fix Them

An account slow consumer event means one or more clients within a specific NATS account are unable to read messages from the server fast enough, causing the server to disconnect them. Unlike SERVER_004, which reports slow consumers at the server level, ACCOUNTS_002 attributes the events to the specific account — telling you which tenant or workload is affected.

Why this matters

In multi-account NATS deployments, knowing that slow consumers are occurring (SERVER_004) is only half the answer. The critical question is whose clients are slow. ACCOUNTS_002 provides that attribution. A slow consumer event in one account doesn’t necessarily indicate a system-wide problem — it may be a single application team’s workload that’s misconfigured, a specific tenant whose message processing can’t keep up, or one microservice that’s doing blocking work in its message handler.

Without per-account attribution, triaging slow consumer events in a multi-tenant deployment is a guessing game. The server-wide counter tells you something is wrong, but with dozens of accounts and hundreds of clients, finding the specific account requires manually querying each one. In large deployments, this turns a five-minute diagnosis into an hour-long hunt. Meanwhile, the affected account’s clients continue losing messages with every disconnection cycle.

The production impact of account-level slow consumers is the same as server-level slow consumers: clients are forcefully disconnected, core NATS messages are permanently lost, request-reply chains break, and if the disconnected clients reconnect to the same server and immediately fall behind again, the cycle repeats. But the remediation is different — it’s targeted at the specific account’s workload, not the server configuration. Knowing the account immediately narrows the scope of investigation and the blast radius of the fix.

Common causes

Processing bottleneck in the account’s message handlers. The most common cause. One or more subscribers in the account are doing slow, blocking work inside their message callbacks — database writes, HTTP calls, serialization — that can’t keep up with the inbound message rate on their subscribed subjects.
Publish rate spike within the account. A publisher in the same account (or an account exporting to this one) increased its publish rate beyond what the account’s subscribers can handle. Batch imports, backfills, or incident-driven traffic surges commonly trigger this.
Insufficient consumer instances for the workload. The account has a single subscriber on a high-throughput subject instead of a queue group with multiple members. The per-instance processing ceiling is below the subject’s message rate.
Cross-account import delivering unexpected volume. The account imports a subject from another account, and the exporting account’s publish rate increased without the importing account’s subscribers being scaled to match. The slow consumer events appear in the importing account, but the root cause is in the exporter.
Network latency affecting the account’s clients. Clients in this account are connected from a high-latency location (remote office, different cloud region). The TCP round-trip time slows message delivery, causing the server-side buffer to fill. Other accounts with locally-connected clients are unaffected.
Client pending buffer too small. The NATS client library’s local pending buffer (default: 64MB / 65,536 messages) fills before the server-side write deadline. This is especially common with large messages or bursty publish patterns.

How to diagnose

Identify the affected account

The /accstatz endpoint shows per-account slow consumer counts:

curl -s http://localhost:8222/accstatz | jq '.account_statz[] | select(.slow_consumers > 0)'

This returns the account name, connection count, and slow consumer count for each affected account.

Find which clients within the account are slow

# Sort connections by pending bytes — highest pending is closest to eviction
curl -s 'http://localhost:8222/connz?sort=pending_bytes&limit=20' | jq '.connections[]'

# Filter to a specific account
curl -s "http://localhost:8222/connz?sort=pending&limit=20&acc=<account_name>" | jq '.connections[] | {cid, name, pending_bytes, subscriptions_list}'

Look for connections with high pending_bytes values. Connections approaching the write_deadline threshold are about to be disconnected.

Check the server-wide slow consumer breakdown

# Server-wide slow consumer stats with per-type breakdown
curl -s http://localhost:8222/varz | jq '{slow_consumers, slow_consumer_stats}'

The slow_consumer_stats field breaks down slow consumers by type: clients, routes, gateways, and leafs. If the count is primarily in clients, the problem is end-client processing speed. If it’s in routes or gateways, inter-server communication is the bottleneck.

Compare publish rate to consumer throughput

# Check per-account message rates
curl -s http://localhost:8222/accstatz | jq '.account_statz[] | select(.acc == "<account_name>") | {sent: .sent, received: .received, slow_consumers}'

If sent.msgs (server sending to clients) is dramatically higher than the account’s processing capacity, the consumers can’t keep up with the delivery rate.

Check for correlated network issues

# Check RTT for connections in this account
nats server list

If the slow consumer events correlate with high RTT connections, network latency is a contributing factor.

How to fix it

Immediate: stop the bleeding

Increase per-subscription pending buffers for the affected account’s clients. Pending limits are set on each subscription (not the connection); raise them and pair with an error handler to surface slow-consumer events:

1
// Go client — pending limits are per-subscription
2
nc, err := nats.Connect(url,
3
    nats.ErrorHandler(func(nc *nats.Conn, sub *nats.Subscription, err error) {
4
        if err == nats.ErrSlowConsumer {
5
            pending, _, _ := sub.Pending()
6
            log.Printf("Slow consumer on %s: %d pending msgs", sub.Subject, pending)
7
        }
8
    }),
9
)
10

11
sub, _ := nc.Subscribe("orders.>", handler)
12
sub.SetPendingLimits(1_000_000, 256*1024*1024) // 1M msgs, 256MB

Tune the server write deadline if the slow consumers are transient (brief spikes, not sustained overload):

1
# nats-server.conf — increase from default 2s (use cautiously)
2
write_deadline: "5s"

This delays slow consumer disconnection, giving clients more time to catch up. It does not fix the underlying throughput mismatch — it just widens the window.

Short-term: fix the throughput mismatch

Move blocking work out of the message handler. Decouple message reception from processing:

1
// Decouple the message callback from processing
2
work := make(chan *nats.Msg, 50_000)
3
sub, _ := nc.Subscribe("orders.>", func(msg *nats.Msg) {
4
    work <- msg  // non-blocking enqueue
5
})
6

7
// Worker pool processes messages independently
8
for i := 0; i < numWorkers; i++ {
9
    go func() {
10
        for msg := range work {
11
            processOrder(msg.Data)  // slow work happens here, not in callback
12
        }
13
    }()
14
}

Add queue group subscribers to distribute load within the account:

# Scale horizontally — each instance gets a fraction of messages
nats sub "orders.>" --queue order-processors

NATS round-robins messages within a queue group. Adding instances linearly increases throughput without changing publishers or subject structure.

If the problem is a cross-account import, coordinate with the exporting account to implement rate limiting or discuss scaling the importing account’s subscribers.

Long-term: design for resilience

Use JetStream pull consumers for critical data flows. Pull consumers provide flow control — the consumer requests messages at a rate it can handle, rather than the server pushing at the publish rate:

1
// Pull consumer — consumer controls the delivery rate
2
sub, _ := js.PullSubscribe("orders.>", "order-processor")
3
for {
4
    msgs, _ := sub.Fetch(100, nats.MaxWait(5*time.Second))
5
    for _, msg := range msgs {
6
        processOrder(msg.Data)
7
        msg.Ack()
8
    }
9
}

Monitor per-account metrics. Set up alerting on the /accstatz slow consumer delta so you catch the problem before users report it.

Implement account-level rate limits. If specific accounts are prone to publish rate spikes, configure account limits to cap the message rate and protect downstream consumers:

1
# In account JWT or server config
2
limits {
3
  max_payload: 1048576
4
  max_connections: 100
5
  max_subscriptions: 1000
6
}

Frequently asked questions

What’s the difference between SERVER_004 and ACCOUNTS_002?

SERVER_004 reports slow consumer events at the server level — a server-wide delta of the slow_consumers counter from /varz. It tells you slow consumers are happening somewhere on that server. ACCOUNTS_002 reports the same events attributed to the specific account via /accstatz. In a multi-account deployment, SERVER_004 fires for the server while ACCOUNTS_002 tells you which account’s clients are responsible. Use SERVER_004 for server health and ACCOUNTS_002 for tenant-level diagnosis.

Can account limits prevent slow consumers?

Account limits (max connections, max subscriptions, max payload) control resource allocation but don’t directly prevent slow consumers. A client can be within all account limits and still be slow if it can’t process messages fast enough. Account limits are more about preventing resource exhaustion than performance management. For flow control, use JetStream pull consumers, which let the consumer request messages at its own pace.

Do slow consumers in one account affect other accounts?

The slow consumer disconnection itself is isolated to the affected client. However, the memory consumed by the server buffering messages for slow clients is shared across all accounts on that server. Under heavy load, buffering for slow consumers in one account can increase memory pressure and GC latency for the entire server, indirectly affecting other accounts’ clients. This is why the server disconnects slow consumers aggressively — it’s protecting the health of all tenants on the server.

How do I find out which subject is causing slow consumers?

The server logs record the subscription details at disconnection time, including subject interest. You can also infer it from the connection details — query /connz filtered to the affected account, look at connections with high pending bytes, and check their subscriptions_list. The subject with the highest message rate that the slow client is subscribed to is usually the culprit:

curl -s "http://localhost:8222/connz?sort=pending&acc=<account>&subs=true&limit=5" | jq '.connections[] | {name, pending_bytes, subscriptions_list}'

Should I use write_deadline tuning to fix slow consumers?

Only as a short-term measure for transient spikes. The default write_deadline of 2 seconds means the server waits up to 2 seconds for a slow client before disconnecting it. Increasing this to 5-10 seconds gives clients more breathing room during brief bursts, but it also means the server holds more memory for buffering and takes longer to shed load under sustained pressure. If slow consumers are persistent, fix the throughput mismatch — don’t delay the disconnection.

FEATURED

RESOURCES

Comparisons