Get the guide: Living on the Edge - A practical guide for building resilient, secure edge-to-core architectures
All posts
Series: NATS Edge Eventing Architecture

Stop Building Firehoses: Flow Control Isn't a Performance Optimization — It's an Architectural Decision

Bruno Baloi
Bruno Baloi
Apr 16, 2026
Stop Building Firehoses: Flow Control Isn't a Performance Optimization — It's an Architectural Decision

Every edge-to-core system I’ve worked on eventually hits the same wall. The data volume is fine. The infrastructure can handle it. But the wrong data is going to the wrong places, and no one has a clean way to change that without rewriting services.

When I wrote the Living on the Edge white paper, flow control was the section I spent the most time on — not because it’s the most complex topic in edge architecture, but because it’s the most consistently underestimated one.

The teams I’ve spoken to tend to treat flow control as a tuning problem: something you revisit when consumers fall behind, or when a core service starts throwing errors under load. What I’ve seen in practice is that by the time flow control becomes an obvious operational problem, the architectural window to fix it cleanly has already closed. You’re not tuning anymore. You’re patching.

Flow control is the difference between a useful event stream and a firehose. And the time to design for it is before you build the pipes.

How firehoses get built

It usually starts with the right instinct: publish everything and let consumers filter. The edge produces a rich stream of telemetry. Core services subscribe and take what they need. Simple, loosely coupled, easy to extend.

The problem surfaces when you look at what “let consumers filter” actually costs at scale.

Each consumer receives the full stream and discards most of it. The network carries data that will never be processed. Services spend CPU cycles on events they have no business seeing. And when you want to change what a consumer receives — because the business logic changed, or a new team wants a different slice of the data — the answer is “redeploy the consumer.” Routing logic is baked into application code, not into the topology.

This is what I call an event-driven architecture that looks decoupled on the surface but is actually tightly coupled at the routing layer. The topology doesn’t know what it’s carrying. The consumers do all the heavy lifting. And the whole thing gets harder to reason about as it grows.

Subject filtering: declarative intent at the subscriber level

The first flow control capability that changes this equation is subject filtering — and in NATS, it’s built into the subject hierarchy natively.

Consider a subject taxonomy like sensors.<plant>.<line>.<machine>.<metric>. A Chicago plant publishes to sensors.chicago.line3.press4.temperature. A core analytics service can subscribe to sensors.chicago.> and receive everything from that plant. A predictive maintenance service can subscribe to sensors.*.*.press4.* and receive data from every press 4 across all plants, regardless of location. A specific operations team can subscribe to a single machine’s metrics without touching anything else.

Filtering is declarative. It lives in the subscription, not in the application. Change what a consumer receives by changing its subscription.

At the edge specifically, this matters because edge environments produce data at rates and granularities that core systems can’t — and shouldn’t — absorb naively. A manufacturing floor streaming from hundreds of sensors doesn’t need every reading to reach every downstream system. The routing decision should be explicit and configurable, not implicit and buried in service logic. Synadia’s education resources cover subject design patterns in depth if you want to go further on this.

Subject mapping: routing as configuration, not code

The second capability I consistently see underused is subject mapping — the ability to reshape or rename subjects at the topology layer without touching publishers or subscribers.

A practical example: edge devices publish to short, local subject names that make sense on the factory floor. Core systems expect a normalized, multi-tenant naming convention that includes region, customer identifier, and data type. Without subject mapping, someone writes a translation service — a piece of glue code that subscribes to the edge format, reformats the subject, and republishes. That service now has to be deployed, maintained, monitored, and scaled.

With subject mapping configured at the NATS server level, the translation happens in the topology. No additional service. No additional failure mode. Routing rules evolve in configuration, not in code. When the naming convention changes — and it will — you update a config file, not a deployment pipeline.

This is what I mean when I say flow control is an architectural decision. Declarative routing via subject mapping is not a convenience feature. It’s a structural choice that determines how much operational surface area your system accumulates over time.

Weighted routing

Safer rollouts without extra infrastructure

A third flow control primitive worth designing for early is weighted routing — the ability to split traffic across multiple consumers at configurable percentages.

The most common use case is canary deployments: route 90% of events to the stable consumer version and 10% to the new one. Observe error rates and behavior before committing. Increase the weight of the new version gradually. Roll back by changing a percentage, not by scrambling a deployment.

Without weighted routing built into the topology, teams simulate this in application code (fragile) or use external load balancers (expensive and complex for an eventing use case). Neither approach gives you the granularity or operational simplicity of weight configuration at the subject level. At the edge, where deploying a new consumer version to a remote or industrial environment carries real operational cost, getting this wrong has consequences beyond a bad metrics dashboard.

Push vs pull

An architectural choice, not a client preference

The fourth dimension of flow control — and the one that becomes critical specifically after edge disconnects — is the consumption model: push versus pull.

Push delivery sends events to consumers as they arrive. It’s fast and low-latency when consumers keep up. It becomes a problem when they don’t — and at the edge, “consumers that can’t keep up” is a routine condition, not a failure mode. An edge node reconnecting after a multi-hour outage has a backlog. If it resumes in push mode, that backlog hits downstream consumers as a sudden spike. The backpressure problem is real and immediate.

Pull consumption inverts this. Consumers request batches when they’re ready. They control their own ingestion rate. After a disconnect and reconnect, catch-up happens at a rate the consumer declares — not at whatever rate the producer decides to resume pushing. JetStream in NATS supports pull consumers natively, which is one of the reasons it’s well-suited to edge-to-core architectures where catch-up after disconnect is an expected, regular operation rather than an edge case to handle defensively.

Push and pull aren’t preferences. They’re a choice about who controls the flow — and at the edge, that control belongs with the consumer.

What this looks like when the design is missing

When I look at edge systems that are struggling operationally, the flow control story tends to follow a recognizable pattern. Subject filtering was never designed, so consumers are receiving full streams and filtering in application code. Subject mapping was never implemented, so there are translation microservices that exist purely as routing glue. Weighted routing was never built in, so canary deployments are manual and risky. Pull consumption was never configured, so reconnection events cause load spikes that require manual intervention to absorb.

None of these are hard problems to solve. They’re all problems that are significantly harder to retrofit into a running system than to design in from the beginning. The Intelecy case study — streaming sensor data from tens of thousands of factory sensors and returning ML-driven insights in under a second — is a good example of what the fully-designed version looks like. That kind of performance with that data volume doesn’t happen by accident. It happens because the routing layer is doing deliberate, declarative work.

Questions worth asking before your next architecture review

Where does your routing logic live today? If the answer is “in the consumers,” you have implicit routing — which means changing routing behavior requires changing and redeploying application code.

Do your edge subjects have a taxonomy? A flat or ad-hoc subject namespace makes filtering and mapping significantly harder to implement cleanly. Design the hierarchy before devices are in the field.

How do your consumers behave after a multi-hour disconnect? If the answer involves manual intervention or unknown behavior, you don’t have a tested reconnection strategy — you have a latent operational incident.

Is weighted rollout of new consumer versions possible today? If the answer is “we’d need to build that,” it’s a signal that traffic shaping isn’t in the topology yet.

Flow control is not glamorous. It doesn’t appear in architecture diagrams as often as it should. But it’s one of the decisions that separates systems that remain operationally manageable at scale from those that accumulate invisible complexity until something breaks badly enough to demand attention.


This post is part of a series on edge-to-core architecture patterns, based on the Living on the Edge: Eventing for a New Dimension white paper. Earlier posts cover why the edge is an operating reality, not a geography, why retry logic fails at the edge, and why edge security isn’t a perimeter problem. Next up: why running separate platforms for pub/sub and streaming is a tax you don’t have to pay.

Previous Posts

  1. The Edge Isn’t a Place — It’s an Operating Reality
  2. Why “Just Retry” Will Kill Your Edge System
  3. Your Perimeter Is Already Gone — Edge Security Isn’t a Checkbox

Related posts

All posts
Why "Just Retry" Will Kill Your Edge System
Technology
The Edge Isn't a Place — It's an Operating Reality
Technology
Get the NATS Newsletter

News and content from across the community


© 2026 Synadia Communications, Inc.
Cancel