RethinkConn is back — the biggest NATS event of the year returns June 4. Save your (virtual) spot.
All posts

Last updated: June 1, 2026

The news, in one paragraph

On May 31, 2026, NVIDIA announced DSX™ — a reference architecture and “playbook” for building gigawatt-scale AI factories. One of its named components, DSX Exchange™, is the event bus that carries operational signals between power, cooling, building-management, grid, and compute systems — and between the software agents that coordinate them. NVIDIA built DSX Exchange on NATS.

Its open-source repository, released under Apache-2.0, describes the DSX Event Bus as “NATS with MQTT 3.1.1, HA clustering, and leaf-node federation”. NATS’s native MQTT bridge connects the OT and building-management devices that already speak MQTT, while NATS adds the things a standalone MQTT broker does not: JetStream persistence and replay, leaf-node federation that unifies every datahall into one namespace, and Auth Callout that gives every publisher a verified identity and every subscriber topic-scoped permissions.

It is a clear, real-world example of NATS doing what it is increasingly chosen for — sitting at the convergence of OT and IT, where device protocols meet cloud-scale messaging.

What is NVIDIA DSX Exchange?

NVIDIA frames DSX Exchange as the operational event bus for an AI factory: a shared messaging layer for telemetry, commands, state changes, and coordination across the systems that keep accelerator infrastructure running. In the DSX Exchange repository, that bus connects building-management systems, process-control systems, power systems, grid signals, schedulers, infrastructure-management services, and agent-facing control surfaces.

The important architectural point is that DSX Exchange is not just a message broker at the edge. It is a factory-wide fabric. Operational systems publish state into a shared namespace; authorized services and agents subscribe, correlate, and act through constrained interfaces.

NVIDIA summarizes the design around four properties in llms-full.txt:

“Legible, Coordinated, Agent-operable, Auditable”

Callout:

  • Legible: factory state is exposed through named, understandable subjects and schemas.
  • Coordinated: services communicate through the bus rather than point-to-point integrations.
  • Agent-operable: AI/MCP agents can observe factory state and use constrained action surfaces.
  • Auditable: actions are attributable, scoped, and logged with caller identity.

That framing matters because an AI factory is not only a data center at larger scale. It is an operating environment where power, cooling, grid flexibility, building controls, GPU telemetry, scheduling, and software automation all need to coordinate at machine speed.

What messaging system does DSX Exchange use?

DSX Exchange uses NATS, the open-source messaging system. The repo describes the DSX Event Bus as “NATS with MQTT 3.1.1, HA clustering, and leaf-node federation”, with deployment assets around the open-source NATS project and community ecosystem components such as NACK, the JetStream Kubernetes controller, and Surveyor for monitoring.

The authentication callout in the repo is NVIDIA’s own Go service built on the standard NATS Auth Callout hook.

What an AI factory event bus needsNATS capability NVIDIA usesWhy a standalone MQTT broker falls short
Connect OT / building-management devicesNative MQTT 3.1.1 compatibilityThis is MQTT’s home turf — parity
Durable telemetry, QoS 0/1, retained messagesMQTT sessions and retained messages backed by JetStream; durable streams for replayBroker-local at best; durable streams/replay are not in the MQTT standard
Unify every datahall into one namespaceLeaf-node federation, CPC to CSCMulti-site federation is a paid add-on or absent
Control which signals cross boundariesAccounts plus subject import/exportNo native cross-cluster subject routing
Multi-tenant isolationAccountsLimited; often a paid tier
Request/reply, not just pub/subNative request/replyNot an MQTT pattern
Verified identity plus topic-level ACLsAuth Callout with OAuth2, mTLS, and NKey pathsVaries; typically broker-specific plugins
High availability3-node NATS cluster per layerClustering frequently paywalled

The resulting architecture uses NATS MQTT support at the OT ingress point, JetStream for persistence and replay, leaf nodes for federation, and accounts for isolation and controlled sharing.

Why MQTT 3.1.1 — and why a broker alone wasn’t enough.

NVIDIA targets MQTT 3.1.1 deliberately. The repo states the reason plainly: “the BMS and process-control industry overwhelmingly ships MQTT 3.1.1 clients”. NATS implements MQTT 3.1.1 — including QoS 0 and QoS 1 delivery, retained messages, and persistent sessions — so building-management and process-control devices can connect without a gateway translating them into a different protocol first. (NATS targets MQTT 3.1.1 specifically, not MQTT 5.0. QoS 1, retained messages, and persistent sessions are stored in JetStream, so JetStream must be enabled — the same persistence layer that provides durable replay above the edge.)

MQTT is widely deployed in OT and building-management environments, and Sparkplug B on top of MQTT is a common way to structure industrial telemetry. Since NATS carries MQTT and Sparkplug B traffic, it does not need to reinterpret the payload semantics to provide the fabric around them.

And the job above the edge is larger than MQTT publish/subscribe. AI factory operations need durable replay for telemetry and events, federation across many datahalls, request/reply for service interactions, multi-tenant isolation, and identity-aware authorization. Those are not capabilities in the MQTT standard itself. In standalone broker architectures, they are often broker-specific extensions, paid add-ons, or separate systems bolted beside the broker.

NVIDIA’s design demonstrates the complementary pattern: MQTT 3.1.1 at the device edge, NATS as the backbone above it. The same NATS deployment speaks MQTT to OT clients and NATS protocol to services and agents, while adding persistence, subject routing, federation, and authorization as native primitives in one system.

How does NATS bridge OT and IT here?

DSX Exchange is an IT/OT convergence architecture. On the OT side, building-management and process-control devices connect over MQTT 3.1.1 with mTLS on port 8883, as described in the repo’s authentication design. On the IT and agent side, software clients connect through OAuth2/JWT flows. Both paths land on the same NATS fabric, isolated by accounts and governed by subject-level permissions.

Across the factory, leaf-node federation connects one Common Services Cluster, or CSC, with many Control Plane Clusters, or CPCs. Each CPC can remain close to a datahall or operational domain while participating in one logical namespace.

NVIDIA's DSX Exchange is an open source event bus for AI factory operations built on NATS.

Diagram redrawn from the DSX Exchange repository’s architecture materials; credit: NVIDIA.

The loops NVIDIA describes in docs/architecture.md show why this fabric matters. BMS power telemetry can feed Dynamic Power Steering and MaxLPS™ decisions, with the architecture describing recovery of “up to 40% stranded capacity.” BMS coolant-leak events can drive NICo cordon-and-migrate workflows. Grid signals can flow through DSX Flex to a scheduler for curtailment. GPU telemetry can feed thermal optimization agents. In each case, the bus is the shared path between OT observation, IT services, and automated action.

What about the AI agents?

The DSX materials use “agent” in two senses. Some components are autonomous control services with names such as Power Management Agent or Infrastructure Management Agent. Separately, the architecture describes AI/MCP agents that can observe factory state and act through constrained interfaces. The agent claim should rest on the second sense.

The key sentence from NVIDIA’s agent-operable principle is that agents can “observe factory state and act through constrained, audited control surfaces”. The technical guardrail is the combination of NATS accounts, Auth Callout, and subject permissions.

A practical model looks like this: an agent can subscribe broadly to the telemetry it is allowed to observe, but it cannot publish arbitrary commands into the factory. Its identity is verified through the authentication path, mapped to a NATS account, and granted scoped publish and subscribe permissions. If it is allowed to request curtailment, cordon a rack, or trigger a migration workflow, that permission is expressed as a subject-level authorization decision, not as blanket bus access.

The repository’s authentication design also emphasizes auditability: every action can be logged with caller identity. That is the difference between an agent connected to a message bus and an agent operating on an authenticated control surface.

Why this matters.

NVIDIA’s DSX announcement calls DSX a “complete playbook to build AI factories,” quoting Jensen Huang in the newsroom release. The DSX Exchange repository shows one important part of that playbook: a shared event bus built on NATS.

The significance is not that MQTT is wrong, or that every OT system should abandon its existing protocols. It is the opposite. MQTT 3.1.1 remains the right fit for many devices at the edge. The architectural shift is what happens above that edge: the same fabric that carries OT telemetry also carries IT service interactions and AI-agent observation and action, with persistence, replay, federation, identity, and isolation built in.

FAQ

What is NVIDIA DSX Exchange?
DSX Exchange™ is the event bus component in NVIDIA’s DSX™ AI factory reference architecture. It carries operational signals between power, cooling, building-management, grid, compute, and software-agent systems.

What messaging system does NVIDIA DSX use?
NVIDIA’s DSX Exchange repository describes the DSX Event Bus as NATS with MQTT 3.1.1, high-availability clustering, and leaf-node federation. The repo is open source under Apache-2.0.

Why does NVIDIA DSX use MQTT 3.1.1?
NVIDIA targets MQTT 3.1.1 because building-management and process-control environments widely ship MQTT 3.1.1 clients. NATS implements MQTT 3.1.1, allowing OT devices to connect while the rest of the architecture uses NATS for persistence, federation, accounts, and request/reply.

How does NVIDIA’s AI factory connect OT and IT?
OT devices connect over MQTT 3.1.1 with mTLS, while IT services and AI/MCP agents connect over OAuth2/JWT paths. Both land on the same NATS fabric, separated by accounts and linked across datahalls through leaf-node federation.

What event bus powers NVIDIA AI factories?
In the open DSX Exchange reference architecture, the event bus is built on NATS. The design uses NATS MQTT support, JetStream, accounts, Auth Callout, and leaf nodes to connect factory systems.

Go deeper


Want help from the NATS experts? Meet with our architects to get help tailored to your use case and environment.

Get the NATS Newsletter

News and content from across the community


Cancel