You Don't Need Two Platforms: The Case for Hybrid Eventing

Every edge-to-core system I’ve worked on eventually acquires a second messaging platform. It’s almost never a deliberate decision. It’s what happens when pub/sub and streaming are solved separately, incrementally, without a unifying architecture to hold them together.

The sequence is familiar. The system starts with a lightweight pub/sub broker for real-time device signals — commands to devices, status updates from sensors, request-reply for operational queries. It works well. Then someone needs durable streams: telemetry that must survive a disconnect, an audit log, event replay for a new analytics pipeline. The pub/sub broker doesn’t support persistence natively, or it does but poorly, so a streaming platform gets added. Now there are two systems.

Two systems means two operational models. Two sets of client libraries. Two security configurations. Two monitoring pipelines. Two failure modes to understand, document, and train on. And at the edge — where nodes are resource-constrained, physically remote, and expensive to touch — two platforms running side by side is an operational liability that compounds every time something goes wrong.

Platform sprawl at the edge isn’t a technical problem. It’s a tax on every decision that follows.

Why pub/sub and streaming get separated in the first place

Pub/sub and event streaming are genuinely different things, and the distinction matters.

Pub/sub is stateless. A message is published to a subject, active subscribers receive it, and it’s gone. The broker doesn’t retain state about what was delivered or to whom. It’s fast, lightweight, and ideal for real-time signals: device commands, status broadcasts, request-reply patterns, operational queries. The constraint is the “active” requirement — if a subscriber is offline when the message arrives, it misses it.

Streaming is stateful. Events are written to a durable, ordered log. Consumers can replay from any point — by time, sequence number, or last-known position. New consumers can be added after the fact and catch up from the beginning. Events survive disconnects, restarts, and consumer failures. The tradeoff is cost: storage, replication, and the operational overhead of managing a persistent log at scale.

At the edge, you need both. Real-time command delivery to a device is a pub/sub problem — low latency, no persistence needed. Sensor telemetry that must survive a four-hour network outage and replay in order when connectivity returns is a streaming problem. The data path isn’t one or the other. It’s both, often on the same physical node, often involving the same data.

The reason teams separate them is that most pub/sub platforms weren’t designed for streaming, and most streaming platforms weren’t designed for edge deployment. Kafka, for example, is an exceptional streaming platform for high-throughput data pipelines in a data center. It requires a JVM, substantial memory, careful partition planning, and a coordination layer — either ZooKeeper or the newer KRaft mode. None of those characteristics are compatible with deployment on a resource-constrained edge gateway. So teams run Kafka in the core and a separate lightweight broker at the edge, and then spend significant engineering effort bridging the two.

What a hybrid platform actually means

A hybrid event platform is one that supports both stateless pub/sub and stateful durable streaming as native, first-class primitives on the same infrastructure — not as two separate products bolted together, but as two modes of the same system.

In NATS, this is the relationship between Core NATS and JetStream. Core NATS handles stateless pub/sub: low-latency, fire-and-forget, at-most-once delivery for signals that need to move fast and don’t require persistence. JetStream, built directly into the NATS server binary, adds the streaming layer: durable streams, configurable retention, at-least-once and exactly-once delivery, consumer replay, and pull-based consumption for backpressure management.

The critical word is “built into.” JetStream isn’t a separate process, a separate cluster, or a separate client library. It’s a configuration flag on the same NATS server that’s already handling pub/sub. A single deployment handles ephemeral signal delivery and durable event streaming simultaneously, from the same infrastructure, with the same security model and the same operational tooling.

As one recent analysis of JetStream vs Kafka puts it: JetStream eliminates the operational split between a message bus and a persistence layer — the same cluster handles ephemeral fan-out and durable consumer groups, with clients across 40+ languages using the same connection model for both.

Why this matters specifically at the edge

Platform consolidation is a reasonable goal in any distributed system. At the edge, it’s an architectural necessity.

Edge nodes are not data center servers. A typical edge gateway might have 2–4 cores, limited RAM, and a storage constraint that makes running a full Kafka broker implausible. More importantly, an edge node is often in a location where operational intervention means a field visit, not an SSH session. Every additional process running on that node is an additional failure mode. Every additional platform is an additional thing that needs to be monitored, upgraded, and recovered when it misbehaves.

NATS’s server binary runs under 20MB. With JetStream enabled, it handles both real-time signals and durable streaming on hardware as constrained as a Raspberry Pi. The same binary that runs on the edge leaf node runs in the core hub cluster and in the cloud. The topology changes — leaf nodes connect to hub clusters, hub clusters form superclusters — but the operational model is identical across the whole stack.

This is what I mean when I describe a hybrid event platform as foundational to resilient edge architecture in the Living on the Edge white paper. It’s not a recommendation to use a particular product. It’s a statement about what the architecture requires: a single system that can handle both the real-time signal layer and the durable event layer, across both edge and core, without introducing a coordination seam between the two.

The coordination seam problem

When pub/sub and streaming run on separate platforms, there’s always a seam — a point where data moves from one system to the other. That seam has to be built, deployed, maintained, and monitored. It becomes a critical path in every data flow that crosses it.

The seam usually takes one of two forms. Either there’s a bridge service — a small application that subscribes to the pub/sub broker and republishes to the streaming platform — or there’s application-level coordination, where individual services are responsible for writing to both systems in the right order. Neither is satisfying. Bridge services are single points of failure. Application-level coordination is fragile, inconsistent across teams, and difficult to reason about when things go wrong.

At the edge, the coordination seam is especially dangerous because it means the edge node has a dependency on the seam being operational for data to flow correctly between the two systems. If the bridge service crashes, or the connection between the two brokers drops, the behavior is unpredictable — and diagnosing it from the core is hard when the edge node isn’t easily accessible.

A hybrid platform eliminates the seam entirely. The same data path handles pub/sub delivery and durable streaming. There is no bridge to maintain, no coordination logic to get right, no seam to fail.

What to look for in real systems

The Intelecy case study — tens of thousands of factory sensors, ML-driven insights returned in under a second — is an example of what the fully-consolidated version looks like at production scale. That round-trip performance is only possible when the signal layer and the streaming layer share infrastructure rather than introducing a coordination hop between them.

Eviny, managing critical energy trading data across 39 hydropower plants with zero data loss, is another: real-time pricing signals and durable transaction streams running through the same platform, with millisecond delivery and no gap between signal and record. Two separate platforms would make that guarantee much harder to enforce.

Questions worth asking before your next platform decision

Are you running separate brokers for pub/sub and streaming today? If yes, count the operational surface: the deployment scripts, the monitoring dashboards, the credential configurations, the client library versions. That’s the real cost of the split.

Does your edge node run both brokers? If it does, understand what happens to data flow when either one restarts. Test it deliberately rather than discovering it in production.

Where is your coordination seam, and who owns it? If the answer is “it’s a shared service that several teams depend on,” you’ve found the most likely source of your next hard-to-diagnose incident.

Could your edge binary footprint support a single consolidated platform? The resource math is usually more favorable than teams assume — especially compared to running two separate broker processes with their own memory footprints and startup sequences.

The edge doesn’t reward complexity. It punishes it, reliably and at inconvenient times. A hybrid eventing platform that handles both the real-time and durable layers from a single deployable is not a nice-to-have. It’s the minimum viable architecture for a system that has to work when no one is watching.

This post is part of a series on edge-to-core architecture patterns, rooted in the Living on the Edge: Eventing for a New Dimension white paper. Earlier posts cover the edge as an operating reality, why retry logic fails at the edge, why edge security is a topology problem, and why flow control is a day-one architecture decision. Next up: why your clustering model becomes your cost model the moment millions of events per second become normal.

Previous Posts