Checks/CHANGE_001

NATS Config Reload Detected: What It Means and How to Fix It

Severity
Info
Category
Change
Applies to
Change
Check ID
CHANGE_001
Detection threshold
server config_load_time changed between consecutive collection epochs

A config reload detected event means a NATS server’s configuration was reloaded between two consecutive monitoring epochs — the config_load_time value in the server’s status changed. Configuration reloads are a normal operational action, but untracked or unexpected reloads can indicate unauthorized changes, misconfigured automation, or drift between servers in a cluster.

Why this matters

NATS server configuration controls authentication, authorization, TLS, connection limits, JetStream settings, and cluster topology. A reload applies changes to most of these settings without a server restart. This is powerful — it means you can rotate TLS certificates, update user permissions, and adjust limits in production without downtime. But it also means a single SIGHUP or reload command can change the security posture of a running server.

In environments with multiple operators, CI/CD pipelines deploying config changes, or configuration management tools (Ansible, Puppet, Kubernetes ConfigMaps), config reloads can happen without explicit coordination. A reload that tightens permissions may disconnect active clients. A reload that loosens permissions may expose subjects that should be restricted. Without change tracking, these modifications are invisible until something breaks.

Config reload tracking is especially important for compliance and audit requirements. If your deployment handles sensitive data, you need to answer questions like “when did the server configuration last change?” and “was the change authorized?” The config_load_time field provides the timestamp, and correlating it with your change management system provides the authorization trail.

Common causes

  • Planned configuration update via signal. An operator sent a reload signal to apply configuration changes. This is the most common and expected cause. The reload applies changes to auth, TLS, logging, and most runtime settings without restarting the server.

  • CI/CD pipeline deploying config changes. Automated deployment pipelines that update server config files and trigger reloads as part of a release process. Expected in well-managed environments, but worth tracking to ensure the pipeline ran at the expected time.

  • Configuration management tool convergence. Tools like Ansible, Puppet, Chef, or Kubernetes ConfigMap updates that detect config drift and trigger a reload to converge the running state with the desired state. May fire unexpectedly if the source of truth changes.

  • TLS certificate rotation. Automated certificate renewal (Let’s Encrypt, cert-manager) updates certificate files and triggers a reload to pick up the new certificates before the old ones expire.

  • Accidental or unauthorized reload. A reload triggered by an operator or script that wasn’t part of a planned change. In shared environments, this may indicate someone modifying production config without following the change process.

How to diagnose

Confirm the reload occurred

Check the server’s current config load time:

Terminal window
nats server info

The output includes config_load_time showing when the configuration was last loaded. Compare this with the expected reload time from your change management records.

For raw monitoring data:

Terminal window
curl -s http://localhost:8222/varz | jq '.config_load_time'

Check server logs for reload details

The server logs the reload event and any errors that occurred during the reload:

Terminal window
grep -i "reloaded" nats-server.log | tail -10

A successful reload logs a confirmation message. A failed reload (invalid config syntax, for example) logs the error but keeps the previous configuration active — the server does not apply a broken config.

Compare configuration across cluster members

In a cluster, verify all servers have consistent configuration. A reload on one server but not others creates config drift:

Terminal window
# Check config_load_time across all servers
nats server list

If servers show different config_load_time values and the most recent reload wasn’t a rolling update, investigate which server was reloaded and whether all servers should have been updated.

Verify the reload source

Determine how the reload was triggered:

  • SIGHUP signal: nats-server --signal reload or kill -HUP <pid>
  • System request: nats server config reload <server-id> (requires system account access)
  • Process manager: systemd, supervisord, or Kubernetes sending HUP after config mount update

Check your process manager logs, CI/CD pipeline history, and operator audit logs to identify who or what triggered the reload.

How to fix it

Immediate: verify the change was intentional

Confirm the reload against your change management records. If the reload timestamp doesn’t match any planned change, investigate immediately:

Terminal window
# Check recent server events
nats events --srv-advisory

Verify server health after the reload. A config change that introduced errors may not be immediately obvious:

Terminal window
nats server check connection
Terminal window
# Check for authentication failures after the reload
curl -s http://localhost:8222/connz?state=closed&sort=stop&limit=10 | \
jq '.connections[] | select(.reason == "Authentication Failure") | {cid, name, stop}'

If clients started disconnecting with auth failures right after the reload, the config change may have affected permissions or credentials.

Short-term: establish config change procedures

Version control all server configuration files. Every config change should be a tracked commit:

Terminal window
# Example workflow
cd /etc/nats
git diff nats-server.conf # review changes before applying
git add nats-server.conf
git commit -m "auth: add order-service user with publish permissions on orders.>"
nats-server --signal reload

Use a config validation step before reload. The NATS server can validate a configuration file without applying it:

Terminal window
nats-server --config /etc/nats/nats-server.conf -t

This checks syntax and resolves includes without starting or reloading the server. Run this in CI before deploying config changes to catch errors before they reach production.

Document which settings require reload vs restart. Not all configuration changes take effect via reload:

Settings that can be reloaded:

  • Authorization and authentication
  • TLS certificates
  • Logging settings
  • Max connections, max payload
  • Account limits

Settings that require a full restart:

  • Listen address and port
  • Cluster name
  • JetStream store directory
  • Gateway name

Long-term: automate change tracking and audit

Integrate config reload events into your alerting pipeline. While config reloads are informational, unexpected reloads should trigger a notification:

1
// Go — subscribe to server advisory events
2
nc, _ := nats.Connect(url, nats.UserCredentials("/path/to/sys.creds"))
3
nc.Subscribe("$SYS.SERVER.*.STATSZ", func(msg *nats.Msg) {
4
// Parse and check config_load_time changes
5
log.Printf("Server stats update: %s", string(msg.Data))
6
})
1
# Python — watch for system events
2
import nats
3
4
async def main():
5
nc = await nats.connect(
6
servers=["nats://localhost:4222"],
7
user_credentials="/path/to/sys.creds",
8
)
9
10
async def handler(msg):
11
print(f"Server event: {msg.data.decode()}")
12
13
await nc.subscribe("$SYS.SERVER.*.STATSZ", cb=handler)

Synadia Insights detects config reloads automatically by comparing config_load_time across consecutive collection epochs for every server in your deployment. Unexpected reloads surface as change events in the Insights dashboard, providing a complete audit trail without any custom monitoring configuration.

Frequently asked questions

Does a config reload cause any downtime or client disconnections?

Not by itself. The reload applies the new configuration to the running server without restarting it. However, if the new configuration changes authentication requirements, tightens permissions, or modifies TLS settings, existing clients that no longer meet the new requirements may be disconnected on their next reconnect or the next permission check. The reload itself is atomic — the server either applies the full new config or keeps the old one if validation fails.

What happens if the reloaded configuration has errors?

The server validates the configuration before applying it. If the new config has syntax errors or invalid values, the reload fails and the server continues running with the previous configuration. The error is logged but no configuration change takes effect. This is a safety mechanism — a bad config file cannot break a running server via reload.

How can I tell what changed in the configuration?

The NATS server doesn’t log a diff of configuration changes — it only logs that a reload occurred. To track what changed, use version control on the config files and correlate the config_load_time with your git history. Some operators also use a pre-reload script that captures the running config (via /varz) before applying changes, creating an automatic before/after record.

Is a config reload the same as a server restart?

No. A reload applies configuration changes to a running server — no process restart, no connection loss (except for clients affected by permission changes), no Raft leadership disruption. A server restart (SERVER_008) is a full process stop and start that drops all connections and triggers Raft re-election for any groups the server participates in. Reloads are preferred for configuration changes that support them.

Should I reload all cluster servers simultaneously or roll the change?

Roll the change one server at a time, just like a rolling restart. While reloads are less disruptive than restarts, a configuration error that only manifests at runtime (e.g., a permission change that blocks critical message flow) is better caught on one server before propagating to the entire cluster. Validate on the first server, confirm health, then proceed to the next.

Proactive monitoring for NATS config reload detected with Synadia Insights

With 100+ always-on audit Checks from the NATS experts, Insights helps you find and fix problems before they become costly incidents.
No alert rules to write. No dashboards to maintain.

Start a 14-day Insights trial
Cancel