Automated Audit Checks
Insights continuously monitors your NATS deployment to surface operational issues and optimization opportunities — before they become incidents.
Featured Checks
Critical checks that catch the most common production issues.
SERVER_001criticalConnection Readiness Failure
Flags servers reporting connection readiness failures via the healthz endpoint.
SERVER_004criticalSlow Consumers
Flags servers with new slow consumer events since the previous epoch.
CLUSTER_007criticalGateway Disconnection
Flags servers that lost a gateway connection since the previous epoch.
JETSTREAM_008criticalStream Quorum Lost
Flags replicated streams where enough replicas are offline to lose quorum.
META_006criticalMeta Quorum Lost
Flags when enough meta cluster peers are offline to lose quorum.
SERVICE_002criticalService Down
Flags services that had instances in the previous epoch but zero in the current epoch.
All Checks
Filter and sort to find specific checks. How do checks work? Learn more.
| Description | |||||
|---|---|---|---|---|---|
| ACCOUNTS_001 | Account Connection Limit | warning | Account | operational | Flags accounts where connections are at or above 90% of the configured limit. |
| ACCOUNTS_002 | Slow Consumers | critical | Account | operational | Flags accounts with new slow consumer events since the previous epoch, aggregated across servers. |
| ACCOUNTS_003 | Inactive JWT Import | critical | Account | operational | Detects imports declared in the account JWT but not activated by the server. Diagnoses root cause: missing activation token, expired token, token signed by rotated signing key, or source export not found. |
| ACCOUNTS_004 | Orphaned Export | warning | Account | operational | Flags exports with no matching importer in any account. Uses NATS wildcard subject matching. |
| ACCOUNTS_005 | No Subscription Interest | info | Account | operational | Finds active imports where no client in the importing account subscribes to the imported subject. Uses NATS wildcard subject matching. |
| ACCOUNTS_006 | Account Subscription Limit | warning | Account | operational | Flags accounts where subscriptions are at or above 90% of the configured limit. |
| CHANGE_001 | Config Reload Detected | info | Change | operational | Detects servers whose configuration was reloaded by comparing config_load_time between consecutive epochs. |
| CHANGE_002 | JetStream Domain Changed | warning | Change | operational | Detects servers whose JetStream domain value changed between consecutive epochs. |
| CHANGE_003 | Account Added or Removed | info | Change | operational | Detects accounts that appeared or disappeared between consecutive epochs. |
| CHANGE_004 | Stream Configuration Changed | info | Change | operational | Detects streams whose configuration fields (replicas, retention, limits) changed between consecutive epochs. |
| CLUSTER_001 | Memory Usage Outlier | warning | Cluster | operational | Flags servers whose memory usage exceeds the configured multiplier of their cluster average. |
| CLUSTER_003 | High HA Assets | warning | Cluster | operational | Flags servers with 1000 or more highly-available JetStream assets. |
| CLUSTER_004 | Cluster Name Whitespace | warning | Cluster | operational | Flags servers whose cluster name contains whitespace characters. |
| CLUSTER_005 | Route Count Low | warning | Cluster | operational | Flags servers with fewer cluster routes than expected based on cluster size. |
| CLUSTER_006 | Connection Count Change | warning | Cluster | operational | Flags servers where the connection count changed dramatically between epochs, indicating a significant increase or decrease in connected clients. |
| CLUSTER_007 | Gateway Disconnection | critical | Cluster | operational | Flags servers that lost a gateway connection since the previous epoch. |
| CLUSTER_008 | Gateway Config Mismatch | warning | Cluster | operational | Flags servers whose set of gateway connections differs from the cluster majority. |
| CONN_001 | High Client RTT | warning | Connection | operational | Flags client connections with round-trip time exceeding 100 ms. |
| CONN_002 | Client Pending Pressure | warning | Connection | operational | Flags client connections with more than 1 MiB of pending bytes. |
| CONN_003 | Connection Stopped | info | Connection | operational | Flags connections that disconnected with a non-empty reason. |
| CONSUMER_001 | Consumer Replica Offline | critical | Consumer | operational | Flags consumer replicas that are reported as offline. |
| CONSUMER_002 | Consumer Replica Lag | warning | Consumer | operational | Flags consumer replicas lagging by more than 1000 operations behind the leader. |
| CONSUMER_003 | Consumer Quorum Lost | critical | Consumer | operational | Flags replicated consumers where enough replicas are offline to lose quorum. |
| CONSUMER_004 | Consumer Delivered Below Stream First Sequence | critical | Consumer | operational | Flags consumers whose last delivered position is below the stream's first sequence after a purge or truncation. |
| CONSUMER_005 | Consumer Sequence Ahead of Stream Sequence | critical | Consumer | operational | Flags consumers whose delivered position is ahead of the stream's last sequence. |
| CONSUMER_006 | Outstanding Ack Critical | critical | Consumer | operational | Flags consumers where num_ack_pending exceeds the operator-defined threshold. |
| CONSUMER_007 | Waiting Critical | critical | Consumer | operational | Flags consumers where num_waiting exceeds the operator-defined threshold. |
| CONSUMER_008 | Unprocessed Critical | critical | Consumer | operational | Flags consumers where num_pending exceeds the operator-defined threshold. |
| CONSUMER_009 | Last Delivery Critical | critical | Consumer | operational | Flags consumers where the time since the last delivery exceeds the operator-defined threshold. |
| CONSUMER_010 | Last Ack Critical | critical | Consumer | operational | Flags consumers where the time since the last acknowledgment exceeds the operator-defined threshold. |
| CONSUMER_011 | Redelivery Critical | critical | Consumer | operational | Flags consumers where num_redelivered exceeds the operator-defined threshold. |
| CONSUMER_012 | Pinned Consumer Policy Mismatch | critical | Consumer | operational | Flags consumers with io.nats.monitor.pinned metadata that are not using the overflow priority policy. |
| JETSTREAM_001 | Stream Replica Lag | warning | JetStream | operational | Flags stream replicas whose last sequence number is more than 10% behind the leader. |
| JETSTREAM_002 | High Subject Cardinality | warning | JetStream | operational | Flags streams with one million or more unique subjects. |
| JETSTREAM_003 | Stream Message Limit | warning | JetStream | operational | Flags streams where message count is at or above 90% of the limit. |
| JETSTREAM_004 | JS API Request Rate High | warning | JetStream | operational | Flags when the JetStream API request rate exceeds the threshold. |
| JETSTREAM_005 | JS API Pending High | warning | JetStream | operational | Flags servers where JetStream API inflight requests exceed the threshold. |
| JETSTREAM_006 | Consumer Count Change | warning | JetStream | operational | Flags when the total consumer count change between epochs exceeds the threshold, indicating a significant increase or decrease. |
| JETSTREAM_007 | JetStream Memory Utilization Critical | critical | JetStream | operational | Flags servers where JetStream memory usage exceeds the critical threshold. |
| JETSTREAM_008 | Stream Quorum Lost | critical | JetStream | operational | Flags replicated streams where enough replicas are offline to lose quorum. |
| JETSTREAM_009 | JS API Error Rate High | warning | JetStream | operational | Flags servers where JetStream API errors exceed a percentage of total requests. |
| JETSTREAM_010 | Stream Byte Limit | warning | JetStream | operational | Flags streams where byte usage is at or above 90% of the limit. |
| JETSTREAM_011 | Stream Consumer Limit | warning | JetStream | operational | Flags streams where consumer count is at or above 90% of the limit. |
| JETSTREAM_012 | JetStream Storage Utilization Critical | critical | JetStream | operational | Flags servers where JetStream storage usage exceeds the critical threshold. |
| JETSTREAM_013 | Stream Subject/Message Count Inconsistency | warning | JetStream | operational | Flags streams where the number of unique subjects exceeds the total message count. An invariant violation indicating filestore corruption. |
| JETSTREAM_014 | Stream Replica Message Count Divergence | critical | JetStream | operational | Flags replicated streams where all replicas report current but have significantly different message counts, indicating filestore corruption or raft state reset. |
| JETSTREAM_015 | Mirror Last Seen Staleness | warning | JetStream | operational | Flags mirror streams where the mirror consumer has stalled. Zero lag but no activity while the source stream continues receiving messages. |
| JETSTREAM_016 | JetStream Storage vs Configured Limit Critical | critical | JetStream | operational | Flags servers where JetStream storage usage critically exceeds the configured max_store limit, risking imminent Raft failures. |
| JETSTREAM_017 | Mirror Lag Critical | critical | JetStream | operational | Flags mirror streams where mirror lag exceeds the operator-defined io.nats.monitor.lag-critical threshold. |
| JETSTREAM_018 | Mirror Seen Critical | critical | JetStream | operational | Flags mirror streams where the time since the mirror was last active exceeds the operator-defined io.nats.monitor.seen-critical threshold. |
| JETSTREAM_019 | Min Sources | critical | JetStream | operational | Flags streams where the source count is below the operator-defined io.nats.monitor.min-sources threshold. |
| JETSTREAM_020 | Max Sources | critical | JetStream | operational | Flags streams where the source count exceeds the operator-defined io.nats.monitor.max-sources threshold. |
| JETSTREAM_021 | Peer Expect | critical | JetStream | operational | Flags streams where the actual peer count does not match the operator-defined io.nats.monitor.peer-expect threshold. |
| JETSTREAM_022 | Peer Lag Critical | critical | JetStream | operational | Flags stream replicas where lag exceeds the operator-defined io.nats.monitor.peer-lag-critical threshold. |
| JETSTREAM_023 | Peer Seen Critical | critical | JetStream | operational | Flags stream replicas where the time since the replica was last active exceeds the operator-defined io.nats.monitor.peer-seen-critical threshold. |
| JETSTREAM_024 | Message Count Threshold | warning | JetStream | operational | Flags streams where message count exceeds operator-defined thresholds. Direction is inferred from threshold ordering. |
| JETSTREAM_025 | Subject Count Threshold | warning | JetStream | operational | Flags streams where subject count exceeds operator-defined thresholds. Direction is inferred from threshold ordering. |
| LEAF_001 | Leafnode Name Whitespace | warning | Leafnode | operational | Flags leafnode connections whose remote server name contains whitespace. |
| LEAF_002 | High Leaf RTT | warning | Leafnode | operational | Flags leafnode connections with round-trip time exceeding the threshold. |
| LEAF_003 | Leafnode Subscription Count High | warning | Leafnode | operational | Flags leafnode connections carrying a large number of subscriptions, which can cause hub processing to exceed the stale connection timeout. |
| META_001 | Offline Replica | critical | Meta Cluster | operational | Flags meta cluster replicas that are reported as offline. |
| META_002 | Leader Disagreement | critical | Meta Cluster | operational | Flags when multiple servers report themselves as the meta cluster leader. |
| META_003 | Meta Leader Flapping | warning | Meta Cluster | operational | Flags when the meta cluster leader has changed more than the allowed number of times in the recent time window. |
| META_004 | Meta Snapshot Slow | warning | Meta Cluster | operational | Flags when the meta cluster snapshot duration exceeds the warning or critical threshold. |
| META_005 | Meta State Growth | warning | Meta Cluster | operational | Flags when the total number of JetStream asset replicas exceeds the threshold. |
| META_006 | Meta Quorum Lost | critical | Meta Cluster | operational | Flags when enough meta cluster peers are offline to lose quorum. |
| META_007 | Even Cluster Size | warning | Meta Cluster | operational | Flags when the meta cluster has an even number of peers. |
| META_008 | Meta Pending High | warning | Meta Cluster | operational | Flags when the meta cluster leader has a high number of pending Raft operations. |
| META_009 | Meta Cluster Size Decreased | critical | Meta Cluster | operational | Flags when the meta cluster size has decreased between consecutive epochs, indicating a peer was removed or lost. |
| OPT_ACCT_001 | Account Storage Quota Approaching Limit | warning | Account | optimization | Flags accounts where JetStream storage reservations approach the configured quota. |
| OPT_ACCT_002 | Excessive JWT Size | warning | Account | optimization | Flags accounts with unusually large JWT claims, indicating excessive permissions or revocations. |
| OPT_BALANCE_001 | Uneven Leader Distribution | info | Balance | optimization | Flags servers hosting disproportionately many stream and consumer leaders. |
| OPT_BALANCE_002 | Connection Hotspot | info | Balance | optimization | Flags servers with more than double the cluster average connections. |
| OPT_BALANCE_003 | Subscription Hotspot | info | Balance | optimization | Flags servers with more than double the cluster average subscriptions. |
| OPT_BALANCE_004 | Stream Replica Count Imbalance | info | Balance | optimization | Flags servers hosting disproportionately many stream replicas. |
| OPT_BALANCE_005 | JetStream Storage Skew | info | Balance | optimization | Flags servers whose JetStream storage exceeds double the cluster average. |
| OPT_BALANCE_006 | Account Connection Concentration | info | Balance | optimization | Flags servers hosting more than 70% of an account's connections. |
| OPT_BALANCE_007 | Stream-Consumer Leader Co-location | info | Balance | optimization | Flags streams where the stream leader's server also hosts a disproportionate share of consumer leaders. |
| OPT_BALANCE_008 | JetStream Storage Saturation with Skew | warning | Balance | optimization | Flags servers with high JetStream storage utilization where the cluster also exhibits significant storage skew between nodes. |
| OPT_COST_001 | Over-Replicated Inactive Stream | info | Cost | optimization | Flags R3+ streams with no new messages across the selected time range. |
| OPT_COST_002 | Memory Storage Large Stream | info | Cost | optimization | Flags memory-backed streams using more than 100 MiB. |
| OPT_COST_003 | Wasted JetStream Memory Reservation | info | Cost | optimization | Flags servers where JetStream memory usage is below 20% of reserved capacity. |
| OPT_COST_004 | Uncompressed Large Stream | info | Cost | optimization | Flags file-backed streams exceeding 1 GiB with no compression enabled. |
| OPT_COST_005 | Wasted JetStream Storage Reservation | info | Cost | optimization | Flags servers where JetStream storage usage is below 20% of reserved capacity. |
| OPT_IDLE_001 | Underutilized Server | info | Idle Resources | optimization | Flags servers that remained nearly idle across the selected time range. |
| OPT_IDLE_002 | Inactive Stream | info | Idle Resources | optimization | Flags unsealed streams that received no new messages across the time range. |
| OPT_IDLE_003 | Inactive Consumer | info | Idle Resources | optimization | Flags consumers that made no delivery progress across the time range. |
| OPT_IDLE_004 | Drained Consumer | info | Idle Resources | optimization | Flags consumers fully caught up with zero pending on an inactive stream. |
| OPT_IDLE_005 | Inactive Account | info | Idle Resources | optimization | Flags non-system accounts with no connections or throughput for the configured inactivity threshold (default 24h). |
| OPT_IDLE_006 | Disconnected Users | info | Idle Resources | optimization | Flags non-system account users with no active connections at the current epoch. |
| OPT_IDLE_007 | Idle Client Connections | info | Idle Resources | optimization | Flags client connections idle for more than 5 minutes with zero messages. |
| OPT_PLACE_001 | Cross-Cluster Stream Access | info | Placement | optimization | Flags accounts with clients in clusters that have no local stream leaders. |
| OPT_PLACE_002 | Consumer Leader Not Co-located | info | Placement | optimization | Flags consumers whose leader is in a different cluster than the majority of connections. |
| OPT_PLACE_003 | High Gateway Traffic Ratio | info | Placement | optimization | Flags accounts where more than 30% of traffic is cross-cluster gateway traffic. |
| OPT_PLACE_004 | Gateway Interest Mode | info | Placement | optimization | Flags gateway account combinations still using optimistic interest mode. |
| OPT_SYS_001 | Streams Without Limits | info | System Improvement | optimization | Flags streams with no message, byte, or age retention limits. |
| OPT_SYS_002 | High Consumer Redelivery | warning | System Improvement | optimization | Flags consumers with a redelivery rate exceeding 10%. |
| OPT_SYS_003 | Ack Pending Buildup | warning | System Improvement | optimization | Flags consumers approaching their maximum ack pending limit. |
| OPT_SYS_004 | Unbound Push Consumer | warning | System Improvement | optimization | Flags push consumers with no subscriber currently bound. |
| OPT_SYS_005 | Route Pending Pressure | warning | System Improvement | optimization | Flags route connections with more than 1 MiB of pending data. |
| OPT_SYS_006 | Leaf Compression Disabled | info | System Improvement | optimization | Flags leaf connections with compression disabled. |
| OPT_SYS_007 | Raft Apply Lag | warning | System Improvement | optimization | Flags Raft groups where committed-applied gap exceeds 100 entries. |
| OPT_SYS_008 | Unlimited JetStream Account | info | System Improvement | optimization | Flags non-system accounts with JetStream enabled but no storage limits. |
| OPT_SYS_009 | Leaderless Raft Group | critical | System Improvement | optimization | Raft group has no elected leader and cannot process writes. |
| OPT_SYS_010 | Raft IPQ Backpressure | warning | System Improvement | optimization | Internal queue lengths for a raft group exceed threshold, indicating processing backlog. |
| OPT_SYS_011 | Subscription Fanout Anomaly | info | System Improvement | optimization | Flags servers where max fanout is disproportionately higher than average fanout. |
| OPT_SYS_012 | Subscription Churn | info | System Improvement | optimization | Flags servers with excessive subscription insert and remove operations since the previous epoch. |
| OPT_SYS_013 | Raft Sustained Catching Up | warning | System Improvement | optimization | Flags Raft groups with a member in catching-up state. |
| OPT_SYS_014 | Gateway Pending Pressure | warning | System Improvement | optimization | Flags gateway connections with more than 1 MiB of pending data. |
| OPT_SYS_015 | Consumer ACK Floor Divergence | warning | System Improvement | optimization | Flags consumers where the gap between delivered position and ACK floor is disproportionately large relative to max_ack_pending, indicating interleaved acknowledgments. |
| OPT_SYS_016 | Direct Gets Disabled | info | System Improvement | optimization | Flags streams with allow_direct disabled, forcing read operations through the Raft consensus pipeline. |
| OPT_SYS_017 | Leafnode Auto Compression with High Count | info | System Improvement | optimization | Flags servers with many leafnode connections using s2_auto compression, which can create a CPU feedback loop under load. |
| OPT_SYS_018 | High Interior Deletes on Stream | warning | System Improvement | optimization | Flags streams with a very high number of interior deletes, causing disproportionate memory pressure during recovery and catch-up. |
| OPT_SYS_019 | Large Deduplication Window | warning | System Improvement | optimization | Flags streams with a deduplication window exceeding the threshold and active message flow, risking high memory consumption from the in-memory dedup map. |
| OPT_SYS_020 | KV Buckets Without max_age | info | System Improvement | optimization | Flags KV buckets with no max_age configured that have accumulated a large number of interior deletes (tombstones). |
| OPT_SYS_021 | R1 Streams in Multi-Node Clusters | info | System Improvement | optimization | Flags R1 (single-replica) streams in multi-node clusters that have no redundancy. |
| OPT_SYS_022 | Subscription Count Growth | info | System Improvement | optimization | Flags servers where subscriptions are growing monotonically without a corresponding increase in connections, indicating a subscription leak. |
| OPT_SYS_023 | Raft WAL Size Excessive | warning | System Improvement | optimization | Flags Raft groups with an excessively large write-ahead log, risking disk exhaustion and cascading OOM failures. |
| OPT_SYS_024 | WorkQueue Discard New with Aggressive Consumer Settings | warning | System Improvement | optimization | Flags WorkQueue streams using discard_policy=new where consumers have aggressive ack_wait or max_deliver settings, risking message loss. |
| OPT_SYS_025 | Sustained Consumer Growth on Stream | warning | System Improvement | optimization | Flags streams where consumer count has been growing steadily, indicating a consumer leak from ephemeral consumers. |
| OPT_SYS_026 | Raft Group Peer Count Mismatch | warning | System Improvement | optimization | Flags Raft groups where the observed peer count exceeds the expected replica count from stream or consumer configuration. |
| SERVER_001 | Connection Readiness Failure | critical | Server | operational | Flags servers reporting connection readiness failures via the healthz endpoint. |
| SERVER_002 | Server Version Mismatch | warning | Server | operational | Identifies servers running a different software version than the cluster majority. |
| SERVER_003 | High CPU Usage | warning | Server | operational | Flags servers where per-core CPU usage meets or exceeds the threshold. |
| SERVER_004 | Slow Consumers | critical | Server | operational | Flags servers with new slow consumer events since the previous epoch. |
| SERVER_005 | JetStream Memory Pressure | warning | Server | operational | Flags servers where JetStream memory usage is at or above 90% of reserved. |
| SERVER_006 | JetStream Domain Whitespace | warning | Server | operational | Flags servers whose JetStream domain name contains whitespace characters. |
| SERVER_007 | Authentication Not Required | critical | Server | operational | Flags servers that do not require client authentication. |
| SERVER_008 | Unexpected Server Restart | critical | Server | operational | Detects servers that restarted without an accompanying version upgrade. Compares start times across consecutive epochs and excludes restarts where the server version changed (planned upgrade). |
| SERVER_010 | High Route RTT | warning | Server | operational | Flags route connections with round-trip time exceeding the threshold. |
| SERVER_011 | Connection Count High | warning | Server | operational | Flags servers where active connections approach the configured maximum. |
| SERVER_012 | Stale Connections | warning | Server | operational | Flags servers with new stale connection events since the previous epoch. |
| SERVER_013 | Stalled Clients | warning | Server | operational | Flags servers with new stalled client events since the previous epoch. |
| SERVER_014 | JetStream Subsystem Unhealthy | critical | Server | operational | Flags servers with JETSTREAM-type healthz errors. |
| SERVER_015 | Stream Recovery Failure | critical | Server | operational | Flags servers with STREAM or CONSUMER-type healthz errors. |
| SERVER_016 | Account Resolution Failure | warning | Server | operational | Flags servers with ACCOUNT-type healthz errors. |
| SERVER_017 | JetStream Storage Pressure | warning | Server | operational | Flags servers where JetStream storage usage is at or above 90% of reserved. |
| SERVER_018 | High Gateway RTT | warning | Server | operational | Flags gateway connections with round-trip time exceeding the threshold. |
| SERVER_019 | JetStream Storage vs Configured Limit | warning | Server | operational | Flags servers where JetStream storage usage approaches the configured max_store limit, which when exceeded causes Raft failures. |
| SERVICE_001 | Service Version Mismatch | warning | Service | operational | Flags services where instances report different client versions or languages. |
| SERVICE_002 | Service Down | critical | Service | operational | Flags services that had instances in the previous epoch but zero in the current epoch. |
| USER_001 | Bearer Token User | warning | User | operational | Flags bearer token users with active connections. |
| USER_002 | Excessive User Connections | warning | User | operational | Flags users with more than 100 active connections. |