The Hidden Cost of Active-Passive Clustering (And What to Do Instead)

The clustering model you choose at the beginning of a project feels like an infrastructure detail. By the time millions of events per second are normal, it has become your primary cost driver. Most teams realize this too late to change it cheaply.

When I was designing the clustering section of the Living on the Edge white paper, I wanted to make one thing clear that often gets buried in discussions about high availability: clustering isn’t just about uptime. It’s about throughput, latency under load, and — at the scale edge systems operate — cost. The wrong clustering model doesn’t just limit performance. It changes the economics of your entire infrastructure.

The conversation usually starts with a reasonable instinct. Active-passive clustering is familiar. It’s well-documented. Most messaging broker documentation defaults to it. You have a primary node handling all traffic, one or more passive standbys ready to take over if the primary fails, and a load balancer in front to route traffic and detect failures. It works, and for systems with modest or predictable traffic, the tradeoffs are manageable.

The problem is that edge systems don’t stay modest. And the tradeoffs that were manageable at low scale become structural problems at high scale.

At millions of events per second, your clustering model becomes your cost model.

What active-passive actually costs

The active-passive architecture has a well-known property that its proponents tend to underemphasize: at any given moment, you’re paying for N nodes but only using one of them for productive work. Passive nodes aren’t idle in a cheap sense — they’re consuming compute, memory, storage, and network capacity to stay synchronized with the primary, ready to take over. That readiness costs money regardless of whether failover ever happens.

This matters more than the hardware budget alone suggests. Active-passive scales vertically. To handle more load, you make the active node bigger: more CPU, more memory, faster storage, higher-bandwidth networking. Vertical scaling works until it doesn’t — and the point at which it stops working is the point at which hardware limits become real and each upgrade becomes exponentially more expensive than the last. There is no active-passive path to handling ten times the throughput that doesn’t involve either significantly larger (and more expensive) individual nodes, or rethinking the architecture entirely.

There’s a second cost that’s harder to quantify but operationally significant: the load balancer tier. Active-passive architectures typically require a load balancer in front of the cluster to route traffic and manage failover. That load balancer has to be sized for peak single-node load, has to be maintained and monitored independently, and becomes a critical path component — if it fails, the cluster is unreachable regardless of node health. You’ve added infrastructure complexity to solve a problem that a different clustering model doesn’t have.

And then there’s the failover itself. Active-passive failover introduces a latency spike: the time from primary failure detection to passive node promotion and traffic resumption is measurable, and in edge systems where downstream consumers have their own backlogs and time-sensitive operations, that gap has consequences. As one quantitative analysis of the tradeoffs puts it: if a service is generating enough value that a few minutes of failover latency costs more than running an additional active server full-time, active-passive is already the more expensive choice — it just doesn’t appear that way on the infrastructure bill.

Full-mesh clustering: what changes

Full-mesh clustering — what I refer to as active-active in the white paper — inverts the economics. Every node in the cluster is active. Every node handles traffic. Clients connect to any node and receive consistent behavior regardless of which one they land on. There is no primary, no passive standby, and no load balancer tier required for basic operation.

The consequences of this design are direct:

Linear horizontal scale. Adding a node to a full-mesh cluster adds proportional throughput capacity. The system doesn’t hit the single-node ceiling because there is no single node. It doesn’t require a load balancer upgrade because clients distribute across nodes natively. The cost of handling twice the traffic is approximately twice the nodes — not twice the nodes plus an upgraded load balancer plus a larger passive standby.

No failover latency spike. When a node in a full-mesh cluster fails, traffic is reallocated across the remaining active nodes. There’s no primary-to-passive promotion sequence, no brief outage while the passive node comes online. The cluster degrades gracefully — capacity decreases proportionally, but the cluster remains operational and observable.

Better resource utilization. Every node is earning its infrastructure cost by handling productive traffic. There are no passive nodes sitting idle on standby. The budget you spend on cluster hardware is the budget you have available for throughput capacity.

The consistency question

The natural concern with full-mesh clustering is consistency: if every node is active and accepts writes, how do you ensure that streaming state and durable event data remain consistent across nodes?

This is where Raft consensus does its work. For durable streaming state — JetStream streams in NATS — replicas are coordinated via Raft, the same consensus algorithm used by etcd, CockroachDB, and other systems designed for strongly consistent distributed data. A Raft cluster elects a leader per stream, replicates log entries to a quorum of followers before committing, and handles leader failure by electing a new leader from the remaining replicas — automatically, without external intervention.

The key insight is that full-mesh clustering and Raft consistency operate at different layers. Full-mesh clustering is about how clients connect and how load distributes across the cluster. Raft is about how the replicated state remains consistent within the cluster. You don’t have to choose between horizontal scale and correctness — you compose them.

In NATS with JetStream, this is the default model. The NATS cluster runs full-mesh: all nodes active, clients connect to any node, load distributes naturally. JetStream streams use Raft for replication: stream replicas are consistent, writes are acknowledged only after a quorum commits, and stream leadership fails over automatically when a node leaves. Linear horizontal scale for throughput, Raft consistency for durable data. Neither compromises the other.

Why this matters specifically at edge scale

In a data center hosting moderate traffic, the difference between active-passive and full-mesh is a performance and cost consideration. At edge scale — industrial IoT environments, connected fleet telemetry, energy infrastructure like the Eviny hydropower system managing real-time energy trading across 39 plants — the clustering model is the difference between an architecture that sustains the load and one that requires constant horizontal firefighting.

Consider the math at a modest edge scale: a single industrial site with 500 machines publishing sensor data at 10 events per second per machine produces 5,000 events per second from that site alone. A fleet of 50 such sites produces 250,000 events per second. At that throughput, an active-passive single-node primary is under continuous pressure. Adding another site means more pressure on the same node, or a vertical scaling event, or an architecture re-evaluation that should have happened earlier.

A full-mesh cluster at the same throughput distributes load across all nodes, scales by adding nodes rather than upgrading existing ones, and handles node failures without a traffic gap. The operational surface is smaller, the cost curve is more predictable, and the architecture has room to grow without forcing a re-evaluation.

Questions worth asking about your clustering model

Is your current broker cluster active-passive? If yes: what’s the maximum throughput your primary node can sustain before vertical scaling is required? Have you measured it under the load patterns you expect at edge scale, not just current traffic?

What is your load balancer tier costing you? Not just in infrastructure, but in operational complexity: the maintenance window you have to plan for load balancer upgrades, the monitoring you run to detect load balancer failures, the on-call playbook for load balancer incidents.

What happens to your system during failover? If the answer involves any period of unavailability or traffic disruption — even brief — that’s a cost. At edge scale, with consumers that have their own backlogs and time-sensitive operations, brief disruptions compound.

Does your clustering model scale linearly with nodes? If adding a node doesn’t add proportional throughput capacity, you’re not running a truly active-active cluster. You’re running something that looks like one but still has a bottleneck.

The clustering model doesn’t announce itself as a strategic decision. It hides in the “infrastructure” section of architecture documents, treated as an implementation detail rather than a design choice. At edge scale, it isn’t. It determines what your system can sustain, how it behaves under failure, and what it costs to operate for the life of the deployment.

Choose it deliberately.

This post is part of a series on edge-to-core architecture patterns, grounded in the Living on the Edge: Eventing for a New Dimension white paper. Find the full series here.

Previous Posts

FEATURED

RESOURCES

Comparisons