Checks/OPT_IDLE_004

NATS Drained Consumer: What It Means and How to Fix It

Severity
Info
Category
Consistency
Applies to
Idle Resources
Check ID
OPT_IDLE_004
Detection threshold
num_pending = 0 AND num_ack_pending = 0 on inactive stream

A drained consumer is a JetStream consumer that has fully processed all available messages — zero pending, zero ack pending — on a stream that is itself inactive (no new messages arriving). The consumer did its job. There’s simply nothing left to do, and nothing new is coming.

Why this matters

A drained consumer is not broken. It’s the expected end state when a stream stops receiving messages and its consumers finish processing the backlog. The question isn’t “what went wrong” — it’s “does this consumer still need to exist?”

The operational cost is real, if modest. Each consumer maintains state in the server’s metadata: delivered sequence, ack floor, configuration, and pending tracking structures. For replicated consumers (R3), each one participates in a Raft group with heartbeats, leader elections, and state snapshots. One drained consumer costs almost nothing. Fifty drained consumers across ten inactive streams start adding up — more Raft groups mean slower meta snapshots, more consensus traffic, and a noisier nats consumer report output.

The bigger risk is organizational. Drained consumers on inactive streams are often the remnants of completed projects, one-time data migrations, or seasonal workloads that ran their course. Without periodic cleanup, they accumulate. Six months later, an operator looking at the system sees hundreds of consumers and streams that appear active in metadata but carry zero traffic. This makes capacity planning unreliable, incident diagnosis slower, and the overall system harder to reason about.

Common causes

  • One-time data processing completed. A stream was used for a data migration, import, or backfill. The consumer processed every message, the pipeline finished, but neither the stream nor the consumer was cleaned up afterward.

  • Seasonal or batch workload ended. A stream that receives traffic during specific business periods — end-of-quarter reporting, holiday traffic, campaign events — goes quiet. The consumer drains and waits. If the next season never comes (project cancelled, business changed), the consumer sits indefinitely.

  • Stream source dried up. The upstream publisher was retired, reconfigured to publish to a different subject, or moved to a different stream. The existing stream stopped receiving messages, and the consumer processed everything that was already there.

  • Over-provisioned consumer pool. Multiple consumer instances were deployed to handle peak load. Traffic dropped, and now some instances have drained completely while others handle the trickle. The drained instances are idle but still connected.

  • Test or development stream left behind. A stream and consumer pair were created in a shared environment for development or QA. The test finished, the consumer drained, and the resources were never cleaned up.

How to diagnose

Identify drained consumers

Run a consumer report on the stream:

Terminal window
nats consumer report EVENTS

Drained consumers show zero in both the “Unprocessed” and “Ack Pending” columns. Cross-reference with the stream’s activity:

Terminal window
nats stream info EVENTS

If the stream’s message count is stable (no new messages arriving) and the consumer has zero pending, it’s drained.

Verify the stream is truly inactive

Check whether the stream has received any new messages recently:

Terminal window
nats stream info EVENTS --json | jq '.state.last_ts'

Compare the last message timestamp with the current time. If the last message was hours or days ago and the consumer has processed everything, both the stream and consumer are idle.

Check for connected subscribers

A drained consumer might still have an active application connected, waiting for new messages:

Terminal window
nats consumer info EVENTS my-consumer --json | jq '.push_bound // .num_waiting'

For pull consumers, num_waiting shows how many pull requests are pending (applications actively waiting). For push consumers, push_bound indicates whether a subscriber is connected to the deliver subject.

Assess the Raft overhead

For replicated consumers, check the cluster state:

Terminal window
nats consumer info EVENTS my-consumer --json | jq '.cluster'

A drained R3 consumer still runs three Raft replicas. If you have many drained consumers, the aggregate Raft overhead may be worth eliminating.

How to fix it

Immediate: decide keep or delete

The fix depends on whether the consumer will be needed again:

Delete if the workload is finished:

Terminal window
nats consumer delete EVENTS my-consumer

Keep if traffic will resume. If the stream is seasonally inactive and will receive messages again, the drained consumer is fine — it’ll pick up where it left off when new messages arrive. No action needed.

Short-term: clean up the stream too

A drained consumer often indicates the stream itself is a candidate for cleanup. If the stream is also inactive (OPT_IDLE_002), consider whether the entire stream should be removed or sealed:

Terminal window
# Seal the stream to prevent accidental writes (preserves data)
nats stream edit EVENTS --sealed
# Or delete the stream entirely (removes all data and consumers)
nats stream delete EVENTS

Sealing is a good middle ground — it marks the stream as read-only, signals to operators that it’s intentionally frozen, and prevents new consumers from being created on it.

Long-term: automate lifecycle management

Tag streams and consumers with purpose metadata. Use consumer names that encode the workload type: migration-2024q3-processor, campaign-holiday-worker. When the purpose is self-documenting, cleanup decisions are obvious.

Establish retention-aware stream policies. For streams that serve one-time workloads, configure max_age at creation time. Even a generous limit like 90 days ensures the stream self-cleans if nobody actively maintains it:

1
// Go - nats.go
2
js, _ := nc.JetStream()
3
4
_, err := js.AddStream(&nats.StreamConfig{
5
Name: "MIGRATION-2024Q3",
6
Subjects: []string{"migrate.orders.>"},
7
MaxAge: 90 * 24 * time.Hour, // auto-cleanup after 90 days
8
})
1
// TypeScript - nats.js
2
import { connect, AckPolicy, RetentionPolicy } from "nats";
3
4
const nc = await connect();
5
const jsm = await nc.jetstreamManager();
6
7
await jsm.streams.add({
8
name: "MIGRATION-2024Q3",
9
subjects: ["migrate.orders.>"],
10
max_age: 90 * 24 * 60 * 60 * 1_000_000_000, // 90 days in nanoseconds
11
});

Use Synadia Insights for automated detection. Insights continuously evaluates consumer and stream activity across your entire deployment. Drained consumers on inactive streams surface automatically as optimization findings, eliminating the need for manual nats consumer report audits.

Frequently asked questions

Is a drained consumer a problem?

Not inherently. A drained consumer is functioning correctly — it processed everything and is waiting for more. It becomes a problem only when the stream is permanently inactive and the consumer will never receive new messages. At that point, it’s wasted resources and operational clutter. The check flags this combination (drained consumer + inactive stream) to prompt a cleanup decision.

What’s the difference between a drained consumer and an inactive consumer?

A drained consumer has processed all available messages — its delivered sequence matches the stream’s last sequence, with zero pending. An inactive consumer (OPT_IDLE_003) hasn’t made any delivery progress at all, regardless of how many messages are pending. A drained consumer completed its work. An inactive consumer may have never started, or stopped mid-stream.

Will deleting a drained consumer lose any data?

No. A drained consumer has already processed all messages. The messages still exist in the stream (subject to the stream’s retention policy). Deleting the consumer removes only the consumer’s tracking state — delivered sequence, ack floor, and configuration. If you re-create a consumer with the same filter subject later, it will start from the stream’s current first available message, not from where the old consumer left off (unless you specify an explicit start position).

Should I delete the stream too if all consumers are drained?

It depends on the stream’s purpose. If it was a one-time migration or batch job, yes — delete the stream and reclaim the storage. If it’s a standing stream that happens to be in a quiet period, keep it. Check the stream’s retention policy: if it has max_age configured, old messages will be cleaned up automatically regardless.

How many drained consumers are too many?

There’s no hard threshold, but it scales with Raft. Each R3 consumer runs three Raft replicas. A hundred drained R3 consumers means 300 Raft groups doing nothing but exchanging heartbeats. At that scale, you’ll see it in meta snapshot times and CPU profiles. For R1 consumers, the overhead is lower — mostly metadata memory. As a rule of thumb, if drained consumers outnumber your active ones, it’s time to clean up.

Proactive monitoring for NATS drained consumer with Synadia Insights

With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.

Start a 14-day Insights trial
Cancel