MQTT vs. NATS for Fleet Management: Hub-and-Spoke vs. Distributed Mesh

The Shift to Distributed Intelligence

Fleet operations have evolved beyond simple telemetry. Traditional IoT architectures follow a hub-and-spoke model where every device connects to a central broker and every message passes through that broker. This works when you have a few thousand sensors sending periodic updates. It breaks down when you have a fleet of 50,000 vehicles generating continuous telemetry while also needing to receive commands, coordinate with nearby assets, and make local decisions when connectivity drops.

Fleets and Edge Computing with MQTT

A truck crossing through a mountain pass loses cellular signal for 20 minutes. In a traditional hub-and-spoke architecture, the onboard systems are now isolated — not just from the cloud, but from each other in any coordinated way.

The MQTT client queues outbound messages (if the client library supports durable buffering), but there’s no local routing between onboard systems, no guaranteed persistence if the device power-cycles, and no way for local applications to query recent state.

If the compressor in a refrigerated trailer triggers an alert, the onboard monitoring system can’t reliably notify the driver-facing application through the broker because the broker is in the cloud. When connectivity returns, the queued messages flood in and the backend must reconcile stale position reports, out-of-sequence events, and potentially duplicated QoS 1 deliveries.

Fleets and Edge Computing with MQTT

With a local NATS leaf node on the vehicle, onboard systems continue communicating through the local server as if nothing changed. Sensor data is persisted to a local JetStream stream. The driver-facing app receives the compressor alert instantly. When cellular signal returns, the leaf node syncs with the cloud automatically — delivering messages in order, without duplication, and without application-level reconciliation logic.

Edge computing changes the requirements.

Modern fleet vehicles need to process data locally. A refrigerated trailer monitoring temperature doesn’t have time to round-trip to a cloud server when the compressor fails. It needs to make decisions on the spot, log the event locally, and sync when connectivity allows.

This requires more than a message pipe. It requires local persistence, local routing, and seamless synchronization with the broader network.

Many fleet operations run separate stacks for different communication patterns: MQTT for telemetry, Kafka for event streaming, gRPC for request/reply, Redis for state management. Each system has its own deployment requirements, monitoring needs, and failure modes. When something goes wrong at 2 AM, you need to understand four different systems to diagnose the problem.

Architectural Foundations: Broker-Centric vs. Fabric-Centric

The biggest differences between NATS and MQTT aren’t the features or performance benchmarks (though those do exist). The real gaps show up in topology. MQTT assumes a central broker. NATS assumes a distributed mesh. This architectural choice shapes everything else.

MQTT fragmented broker islands with separate ACL stores compared to NATS unified mesh with shared security domain

The MQTT Broker Model

In a typical MQTT deployment, every message from a device in the field travels to a central broker before it can be routed anywhere else. This creates what network engineers call the “trombone effect.”

Consider a site with 30 edge nodes coordinating local operations.

In a typical MQTT deployment, every message travels to a cloud broker and back — even when the publisher and subscriber are on the same local network. You could deploy a local MQTT broker on-site to solve this, but now you’re operating a separate broker instance with its own configuration, ACLs, and monitoring.

Bridge it to your cloud broker for central visibility, and you’ve added another integration point to manage. Scale this to 200 sites and the operational burden becomes significant.

With NATS, a leaf node at each site handles local traffic natively. It’s the same binary, same configuration patterns, and same security model as the cloud cluster. When connectivity is available, it bridges automatically. There’s no separate “local broker” to configure and manage — it’s one topology that works at every scale.

State management remains limited.

MQTT provides retained messages (the broker stores the last message per topic) and persistent sessions for QoS 1/2 delivery. These features handle basic reconnection scenarios and last-known-value lookups. But MQTT has no native concept of a message stream — there’s no built-in way to store a rolling window of telemetry, replay historical data, or distribute persistent state across a broker cluster.

If you need to retain a week of sensor diagnostics for trend analysis, query the last 1,000 telemetry readings for historical reconstruction, or ensure that commands queue reliably during extended offline periods, you need an external database or streaming system alongside your MQTT broker.

Delivery reliability has a structural gap.

The MQTT specification expects clients to acknowledge messages on receipt, not after processing. This creates a problem: if a microservice receives a message and then crashes before finishing its work, that message is gone.

The obvious workaround — delaying the acknowledgment until processing completes — works against the protocol’s design assumptions and significantly reduces throughput, since the broker must hold the message and wait. Meanwhile, QoS 1 can deliver duplicate messages after network interruptions, pushing deduplication logic onto every downstream consumer.

Teams building on MQTT at scale end up layering custom retry queues, deduplication filters, and dead-letter handling on top of the broker — additional complexity that introduces its own failure modes.

Vendor-specific nuance

MQTT makes subscribing easy: topics like fleet/device-01/telemetry/sensors plus wildcards (+, #) let you consume data flexibly. But rerouting or rewriting topics—e.g., sending all sensor telemetry to a new service or remapping topics during a migration—usually relies on broker-specific features (rule engines, extensions, plugins). Those rules vary by **broker implementation—which in practice means by vendor—**and they don’t port cleanly.

NATS solves this with standardized subject mapping: one mechanism that behaves the same across every NATS server, including edge leaf nodes.

A similar issue shows up with QoS. MQTT defines QoS 0/1/2, but real behavior depends on broker and client implementations—so QoS 1/2 guarantees can differ across stacks.

Broker federation introduces complexity

When scaling MQTT across multiple sites or regions, federated broker architectures create operational challenges.

Each broker maintains its own state and subscription information. Bridging brokers together requires careful configuration to avoid message loops, duplicate deliveries, and inconsistent topic namespaces. A message published in one region may need to traverse multiple broker hops to reach subscribers in another region, each hop adding latency and potential failure points.

Managing topic mappings, access controls, and monitoring across a federated MQTT deployment quickly becomes a full-time operational burden. MQTT bridges can also lose messages during network disruptions or broker restarts, and because bridges operate outside the QoS guarantee chain, these gaps may go undetected until downstream systems show missing data.

Stream processing requires external systems.

MQTT brokers excel at routing messages but lack native stream processing capabilities. If you need to aggregate sensor data, perform windowed calculations, or maintain stateful transformations, you must integrate external stream processing frameworks. This typically means deploying Apache Kafka, Apache Flink, or similar systems alongside your MQTT infrastructure.

Each additional system introduces its own operational complexity, monitoring requirements, and potential failure modes.

NATS JetStream doesn’t replace dedicated stream processors like Flink for complex windowed aggregations or stateful transformations. But it does eliminate the need for a separate durable log system (typically Kafka) alongside your messaging broker. Persistence, replay, and consumer tracking are built in — one system instead of two for the most common fleet data pipeline patterns.

The NATS Adaptive Edge

NATS functions as a connective fabric rather than a single box in the middle. The difference becomes clear when you look at how it handles edge deployments.

The power of Leaf Nodes

Using Leaf Nodes, any edge device can run its own local NATS server. This node handles all local traffic between sensors, controllers, and co-located applications. When cellular connectivity is available, the leaf node transparently bridges to the cloud. When connectivity drops, local operations continue without interruption. Messages destined for the cloud queue up and sync automatically when the connection returns.

Flexible subject hierarhcies

NATS uses dot-separated subjects like fleet.device01.telemetry.sensors. You can use subject mapping to transform or redirect data streams without touching device code. Need to send all sensor telemetry to a new analytics service? Add a mapping rule at the server level. The devices don’t need a firmware update.

Unified transport reduces complexity

NATS combines Pub/Sub, Request/Reply, and distributed persistence (JetStream) into one protocol. You don’t need a sidecar database to ensure a message survives a reboot. You don’t need a separate RPC framework for command-and-control. One binary, one protocol, one set of operational procedures.

Security: Perimeter-Based vs. Decentralized Identity

Managing security for 100,000 distributed assets is a different problem than securing a data center. Devices get compromised. Edge nodes get decommissioned. Credentials need to be rotated without taking the fleet offline.

MQTT Security Limitations

Standard MQTT deployments rely on TLS for transport security and username/password or client certificates for identity. This works, but it creates operational challenges at scale.

TLS handshakes are computationally expensive for any protocol, including NATS.

The difference is what happens above TLS. MQTT’s most common authentication method — username and password — transmits credentials over the encrypted channel. If TLS is skipped for constrained devices (which some MQTT deployments do), those credentials travel in the clear. Client certificates offer stronger MQTT authentication, but they require TLS by definition and add certificate lifecycle management overhead across a large fleet.

NATS’s NKey authentication uses cryptographic challenge-response that never transmits secrets at the application layer, providing an additional security layer independent of the transport.

Access Control Lists (ACLs) in MQTT are typically managed centrally. The broker maintains a list of which clients can publish or subscribe to which topics. For a fleet of 100,000 devices, this list becomes large and complex. Every permission change — onboarding a new device, rotating credentials, adjusting topic access — requires updating this central store and ensuring all broker nodes reflect the change. In federated deployments, keeping ACLs synchronized across brokers in different regions introduces consistency windows where permissions may differ between sites.

This challenge compounds in multi-region deployments. Each MQTT broker instance maintains its own ACL store. A credential revocation in one region must be propagated to every other broker — and until it is, the revoked credential remains valid at those sites.

NATS JWTs and revocation lists propagate automatically across the entire cluster, including leaf nodes at the edge. Revoking a device’s access takes effect everywhere within seconds.

NATS Decentralized Security

NATS uses a fundamentally different security model based on NKeys and JWTs. The private key never leaves the device. Identity is proven through cryptographic challenge-response, not by transmitting credentials over the wire.

NKeys use Ed25519 signatures

When a device connects, it proves it holds a private key by signing a challenge from the server. The server only needs the public key to verify the signature. If an attacker intercepts the connection, they get nothing useful. If the server is compromised, the attacker still can’t impersonate devices because they don’t have the private keys.

JWTs encode permissions

A device’s JWT might specify that it can publish to telemetry.deviceA.* and subscribe to commands.deviceA.*. These permissions are signed by an account key, so they can’t be forged or modified. The NATS server validates the JWT signature and enforces the permissions without consulting an external database.

Straightforward Multi-tenancy

Different fleets or departments can share the same NATS infrastructure with complete data isolation. Each account has its own signing key and its own permission boundaries. A device in Fleet A literally cannot see or interact with subjects belonging to Fleet B, even if they’re connected to the same server.

Credential revocation is targeted and instant

You can revoke a single device’s access by adding its public key to a revocation list. This takes effect immediately across the entire cluster without restarting anything or updating a central database. The compromised device loses access; everyone else continues operating normally.

Telemetry and Command & Control Patterns

IoT data flows fall into two broad categories. Telemetry is “fan-in”: many devices sending data to a smaller number of collection points. Command and control is “fan-out” or “point-to-point”: sending specific instructions to specific devices and getting responses back.

Telemetry and Data Persistence

A modern fleet of edge devices generates a lot of data. Sensor readings, location coordinates, operational metrics, environmental telemetry, system diagnostics. Some of this data needs to be processed in real time. Some needs to be stored for later analysis. Some needs to be retained for regulatory compliance.

MQTT Data Loss

Data loss can begin before messages even reach the broker. Full-featured MQTT client libraries like Eclipse Paho support persistent offline queuing, and MQTT’s session expiry mechanism is designed for reconnection scenarios.

But on constrained edge hardware — the microcontrollers, industrial gateways, and embedded devices common in fleet environments — lightweight MQTT clients often lack durable buffering. When connectivity drops on these devices, unsent messages are simply lost. Unplanned power cycles (routine in vehicle and industrial environments) compound the problem, potentially corrupting any local message queue that does exist. MQTT’s persistence guarantees depend heavily on which client library you’re running and how it’s configured, which creates inconsistency across a heterogeneous fleet.

In MQTT deployments, high-volume telemetry often requires a bridge to a system like Kafka for durable storage. The MQTT broker handles the real-time routing; Kafka handles the persistence and replay. This works, but it means operating two different systems with different semantics and failure modes.

The integration layer between MQTT and Kafka becomes another component that needs monitoring, error handling, and operational procedures. When messages fail to bridge from MQTT to Kafka, diagnosing whether the problem lies in the MQTT broker, the bridge component, or the Kafka cluster requires expertise in all three systems.

NATS JetStream integrates persistence directly into the messaging layer

When a device publishes telemetry, it can be simultaneously delivered to real-time consumers and stored in a stream for later processing. The stream can be configured with retention policies (keep the last 7 days, or the last 10GB, or messages matching certain criteria) and replicated across multiple servers for durability.

Pull consumers change the processing model. Instead of the messaging system pushing data and potentially overwhelming a backend service, consumers can pull messages at their own pace. A batch analytics job can request 1,000 messages, process them, then request the next batch. If the job crashes, it resumes from where it left off. No messages are lost, and the messaging system isn’t responsible for tracking consumer state.

Resource efficiency matters at the edge. NATS is a single binary under 20MB. It runs comfortably on a Raspberry Pi or an industrial gateway. You can deploy a full-featured NATS server with JetStream persistence on hardware where a Java-based broker or Kafka instance would be impossible. This makes it practical to run real persistence at the edge, not just in the cloud.

Command and Control

Sending a command to a specific device and getting a response back is conceptually simple. In practice, it involves correlation IDs, temporary response topics, timeout handling, and retry logic. MQTT 5.0 added response topic and correlation data properties that standardize request/reply, but the pattern still requires developers to manage response routing and timeout logic at the application level.

NATS treats request/reply as a first-class operation. An operator sends a configuration command to a specific edge node. NATS automatically creates a temporary inbox for the response, routes the command to the device, and delivers the response back to the operator. The correlation and routing happen transparently.

Offline commands work naturally with JetStream. If a device loses connectivity when a command is sent, the command can be stored in a stream. The moment the device reconnects and subscribes to its command subject, NATS delivers the waiting message. The operator can choose whether to wait for acknowledgment or fire-and-forget.

Fan-out commands scale efficiently. Sending a firmware update notification to 10,000 devices doesn’t require 10,000 individual messages. Devices subscribe to a subject like commands.fleet.firmware, and a single publish reaches all of them. NATS handles the fan-out at the server level, not the application level.

Technical Comparison: Performance and Scalability

When comparing NATS and MQTT, the differences emerge under load, at scale, and in complex deployment scenarios.

Feature	MQTT (v5.0)	NATS (with JetStream)
Topology	Hub-and-Spoke	Distributed Mesh / Leaf Nodes
Throughput	Millions msg/sec (broker dependent)	100M+ msg/sec (cluster, small payloads)
Persistence	Requires external DB	Native (JetStream)
Security	Centralized ACLs / TLS	Decentralized JWT / NKeys
Latency	Low (deployment dependent)	Sub-millisecond (p99)
Communication	Pub/Sub, Req/Reply (v5.0)	Pub/Sub, Req/Reply, Key-Value, Object Store

Throughput and Latency

NATS is designed for extreme performance. In standard benchmarks, a NATS cluster handles over 100 million messages per second with small payloads. Single-server throughput exceeds 10 million messages per second. For most fleet applications, you’ll never approach these limits, but the headroom means you can run fewer servers and handle traffic spikes without degradation.

Latency matters for operational applications like real-time fleet monitoring, dynamic edge orchestration, and instant alerting. NATS consistently delivers sub-millisecond p99 latency in production deployments.

More importantly, the architectural difference compounds the gap: when messages must traverse a centralized cloud broker, even a fast MQTT implementation adds network round-trip time that dwarfs any protocol-level difference. A locally deployed NATS leaf node processing edge traffic eliminates that round-trip entirely, delivering responses in microseconds for local operations while still syncing to the cloud asynchronously.

Scalability Considerations

Scaling MQTT typically involves clustering multiple brokers behind a load balancer. This introduces complexity around session affinity (a client must reconnect to the same broker to resume its session), shared subscription state, and network partition handling. Different MQTT vendors handle clustering differently, and the operational procedures vary accordingly.

As useful and seamless as NATS leaf nodes are at the edge, NATS is equally adept at scaling up in the cloud. Servers discover each other through a gossip protocol and automatically form a cluster. If a server fails, clients reconnect to another server in the cluster without losing messages (assuming JetStream persistence). Adding capacity means starting new servers and letting them join the cluster. Removing capacity means draining connections and shutting down.

For global deployments, Synadia Cloud provides a managed supercluster with points of presence worldwide. Edge devices connect to the nearest location, and NATS handles the global routing automatically. This eliminates the need for complex Geo-DNS configurations or custom global load balancing logic.

Implementation: Bridging Legacy and Modern Infrastructure

Most fleet managers aren’t starting from scratch. You have existing hardware running MQTT clients that can’t be easily updated. You have backend systems that expect MQTT topics. The goal isn’t to replace everything overnight. It’s to modernize the backbone while preserving existing investments.

The MQTT Bridge

NATS includes a built-in MQTT bridge that allows MQTT clients to connect directly to a NATS server. The bridge maps MQTT topics to NATS subjects automatically. A legacy sensor publishing to fleet/device01/temperature appears on the NATS side as fleet.device01.temperature.

This lets you keep MQTT at the very edge, on the simple sensors or legacy gateways at the edge, while using NATS for everything else. A modern microservice in the cloud subscribes to NATS subjects and receives messages from both MQTT and native NATS clients without knowing the difference.

The bridge also works in reverse. A NATS service can publish commands that MQTT clients receive on their expected topics. This makes it possible to migrate incrementally: new devices use native NATS clients, legacy devices continue using MQTT, and both participate in the same messaging fabric.

Moving to a Global Supercluster

For global fleet operations, managing your own worldwide cluster is a distraction from your core business. You need servers in multiple regions, monitoring and alerting for each location, procedures for handling regional outages, and engineers who understand the operational details.

Using Synadia Cloud provides a globally distributed supercluster without the operational burden. Edge devices connect to the nearest point of presence. Messages route automatically between regions. If a region goes down, clients reconnect to another region and continue operating. The complexity is managed by us, not your team.

This also simplifies compliance. Data residency requirements often mandate that certain data stays within geographic boundaries. With proper subject mapping and account configuration, you can ensure that European fleet data stays on European servers while still participating in a global command-and-control system.

Making the Choice

The decision between NATS and MQTT isn’t about which protocol is “better” in the abstract. It’s about which tech better aligns with your requirements.

MQTT makes sense when you’re working with extremely constrained hardware that can’t run a more capable client library, when your needs are limited to simple, low-frequency telemetry, and when you’re integrating with existing systems that expect MQTT specifically.

NATS makes sense when you need a resilient, high-performance fabric that supports edge computing with local persistence, when you require decentralized security that scales to hundreds of thousands of devices, when you want unified messaging, streaming, and key-value storage in one system, and when you’re building for growth from hundreds of edge nodes to millions of connected devices.

In practice, many deployments use both. MQTT handles the last inch of connectivity to legacy sensors. NATS provides the backbone that connects everything else. The MQTT bridge makes this hybrid architecture straightforward to implement and operate.

The architectural flexibility of NATS means you can start simple and grow without changing your fundamental communication patterns. A proof-of-concept with 50 edge nodes uses the same subjects, the same security model, and the same client code as a production deployment with 500,000 connected assets. The infrastructure scales; your application logic stays the same.

Ready to build your distributed mesh architecture? Partner with the NATS experts at Synadia to get your fleet deployed faster.