NATS Service Version Mismatch: What It Means and How to Fix It

Service version mismatch means instances of the same NATS micro service are reporting different client library versions or languages — indicating a heterogeneous deployment. All instances of a service should run identical code. When they don’t, requests routed to different instances may produce inconsistent behavior, subtle bugs, or outright failures depending on what changed between versions.

Why this matters

NATS micro services use the built-in service discovery protocol ($SRV.PING, $SRV.INFO) to advertise themselves. Each instance reports its name, version, and metadata. When NATS routes a request to a service, it round-robins across available instances using queue groups. If instances are running different versions, clients get different behavior depending on which instance handles their request — a class of bug that’s notoriously hard to reproduce and diagnose.

The risk scales with the nature of the change between versions. A minor dependency update might be harmless. A schema change, a behavior modification, or a breaking API change means some requests succeed and others fail, or worse, some return correct data and others return stale or incorrect results. In financial, healthcare, or compliance-sensitive systems, inconsistent responses from the same service are not just bugs — they’re audit findings.

Version mismatches are expected during rolling deployments. The check is designed to catch deployments that stalled partway through — when 8 of 10 instances updated but 2 are still running the old version. It also catches the more insidious case where different teams or environments accidentally deploy different versions of the same service, creating a permanent heterogeneous deployment that nobody realizes exists.

Common causes

Incomplete rolling deployment. A deployment started but didn’t finish. Some pods updated to the new version while others remained on the old version. This is the most common cause and is usually transient — but if the deployment is stalled or paused, the mismatch persists indefinitely.
Stale canary deployment. A canary instance was deployed to test a new version, then forgotten. The canary continues handling a percentage of requests with the new version while the majority runs the old version. Canary deployments should have explicit time limits and cleanup procedures.
Container image tag not updated in some environments. The deployment manifest references latest or a mutable tag that was updated in some registries but not others. Different instances pull different images depending on when they started and which registry they hit.
Multiple teams deploying independently. In organizations where different teams own instances of the same service (e.g., regional deployments), one team may deploy a new version while another team’s instances remain on the previous version. Without a coordinated release process, this creates a split.
Build artifact inconsistency. The service was rebuilt on different machines or at different times, producing binaries with different embedded versions. This is especially common when version strings are derived from git tags or build timestamps rather than explicit semver.

How to diagnose

List all services and their versions

The NATS CLI provides built-in micro service discovery:

nats micro list

This lists all discovered services with their version. If a service shows multiple versions, you have a mismatch.

Inspect per-instance details

Drill into a specific service to see which instances report which version:

nats micro info <service_name>

This shows each instance’s ID, version, and metadata. Compare versions across instances to identify which are outdated.

Ping all instances

To discover all running instances and their response time:

nats micro ping <service_name>

Every instance that responds reports its version. Instances that don’t respond may be the ones that need attention — cross-reference with SERVICE_002 (Service Down).

Check instance statistics

nats micro stats <service_name>

This shows per-instance request counts and error rates. If the old-version instances have higher error rates, the version mismatch may already be causing failures.

Verify from the deployment platform

Cross-reference NATS service discovery with your deployment platform:

# Kubernetes
kubectl get pods -l app=<service_name> -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}{end}'

This helps identify whether the mismatch is a deployment issue (wrong image) or an application issue (same image, different reported version).

How to fix it

Immediate: identify and assess the mismatch

Determine if the mismatch is transient or persistent. Check if a deployment is in progress:

# Kubernetes
kubectl rollout status deployment/<service_name>

# If stuck
kubectl rollout history deployment/<service_name>

If a rollout is stuck, either resume it or roll back. A half-completed deployment is worse than running the old version consistently.

Short-term: complete or roll back the deployment

Complete the rolling deployment. If the new version is validated, push the deployment to completion:

# Kubernetes — restart stalled pods
kubectl rollout restart deployment/<service_name>

Or roll back to a consistent version. If the new version has issues, roll back all instances to the known-good version:

kubectl rollout undo deployment/<service_name>

Clean up canary instances. If a canary deployment was left running, either promote it to full deployment or terminate it:

nats micro ping <service_name>
# Identify canary instance IDs, then remove from orchestrator

Verify service version consistency after the fix:

1
// Go — programmatic service version check
2
nc, err := nats.Connect(url)
3
if err != nil {
4
    log.Fatal(err)
5
}
6

7
// Use the service discovery protocol to list instances
8
reply, err := nc.Request("$SRV.INFO."+serviceName, nil, 2*time.Second)
9
if err != nil {
10
    log.Fatal(err)
11
}
12
log.Printf("Service info: %s", string(reply.Data))

1
// TypeScript (nats.js) — check service version
2
import { connect } from "nats";
3

4
const nc = await connect();
5
const sub = nc.subscribe("$SRV.INFO.myservice", { max: 10, timeout: 2000 });
6

7
for await (const msg of sub) {
8
  const info = JSON.parse(new TextDecoder().decode(msg.data));
9
  console.log(`Instance ${info.id}: version ${info.version}`);
10
}

Long-term: prevent version mismatches

Use immutable image tags. Never deploy with latest or mutable tags. Use digest-based or commit-SHA-based image references so every instance runs exactly the same binary.

Implement deployment health gates. Add a post-deployment check that queries nats micro info <service> and verifies all instances report the same version. Fail the deployment pipeline if the check doesn’t pass within a timeout window.

Coordinate cross-team releases. If multiple teams deploy instances of the same service, establish a release process that coordinates version bumps. Use a shared release channel or require version lockstep for services that handle the same request types.

Set explicit version strings. Embed the version in the service code at build time from a single source of truth (e.g., a VERSION file or git tag), not from implicit metadata that can vary between builds.

Frequently asked questions

Is a version mismatch during a rolling deployment a real problem?

During an active rolling deployment, a brief version mismatch is expected and usually harmless — NATS routes requests to whichever instance is available, and both old and new versions should handle requests correctly if the deployment is backward-compatible. The check flags it because the mismatch is real; the investigation is determining whether it’s transient (deployment in progress) or stuck (deployment stalled). If the mismatch persists for more than a few minutes, something is wrong.

Does this check detect language mismatches too?

Yes. NATS micro services report both their version and the client library language. If some instances are running a Go implementation and others are running a Python implementation of the same service name, this check fires. While polyglot service implementations are technically possible, they’re a significant consistency risk and should be intentional, not accidental.

What if different versions are intentional?

In rare cases — A/B testing, gradual feature rollouts — running multiple versions of a service simultaneously is intentional. If this is your architecture, you should use different service names (e.g., orders-v1 and orders-v2) rather than running mismatched instances under the same name. NATS routes requests within a service name using queue groups, so mixed versions under one name means clients can’t control which version handles their request.

How does this relate to Server Version Mismatch (SERVER_002)?

SERVER_002 checks the NATS server binary version — whether all servers in the cluster run the same nats-server version. SERVICE_001 checks application-level service versions — whether all instances of a NATS micro service run the same application code. They operate at different layers: SERVER_002 is infrastructure consistency, SERVICE_001 is application consistency. Both are important, but they have different causes and remediation paths.

FEATURED

RESOURCES

Comparisons