Designing NATS JetStream for per-tenant FIFO processing

A community member asked how to design a NATS-based multi-tenant SaaS backend where events must be handled FIFO per tenant, while different tenants should be processed in parallel.

The short answer: a stream per tenant can be a reasonable JetStream design for this requirement, especially at roughly hundreds to low thousands of tenants, as long as the streams and consumers are relatively stable and the system is load-tested with realistic traffic. The main design choices are subject layout, stream boundaries, consumer sharing, and future tenant isolation.

Separate ordering from parallelism

For this kind of workload, there are two different goals:

FIFO within a tenant: events for tenant A should be handled in order.
Parallelism across tenants: tenant A should not block tenant B.

A tenant-scoped stream aligns well with that model. If each tenant has its own stream, then each tenant has an independent sequence of events. A slow or blocked consumer for one tenant does not inherently stall the stream for another tenant.

That is often preferable to putting all tenants into one large stream and then relying heavily on many filtered consumers. A single stream with many sparse subject filters can become harder to reason about and may not scale as cleanly as more linear consumption from tenant-local streams.

Use a fixed first subject token

A good subject structure starts with a stable application or domain token, not the tenant ID.

For example:

1
{app}.tenants.{tenant}.{eventType}

or a more concrete shape such as:

1
billing.tenants.tenant-123.invoice.created
2
billing.tenants.tenant-123.payment.received

The important part is that the first token is fixed for the application or domain. Avoid a design where the first token is the tenant ID and common subscriptions need broad first-token wildcards such as:

1
*.orders.created

Keeping the first token stable gives you a cleaner subject hierarchy for routing, permissions, stream definitions, imports/exports, and future operational changes.

Map each tenant to its own stream

A tenant stream can capture that tenant’s subject prefix:

1
billing.tenants.tenant-123.>

This gives each tenant an independent stream sequence. If FIFO is required per tenant, that independence is useful: each tenant can proceed, pause, retry, or accumulate backlog separately.

A simplified stream model might look like this:

1
Stream: TENANT_tenant-123
2
Subjects: billing.tenants.tenant-123.>

Then repeat that pattern for each tenant.

This design is most attractive when tenants are relatively stable. Creating a few hundred or a few thousand streams and consumers is a different operational profile from rapidly creating and deleting them at high frequency.

Give each service its own durable consumer

If multiple backend services need to observe the same tenant’s events, each service should have its own durable consumer on that tenant stream.

For example, for tenant tenant-123:

1
Stream: TENANT_tenant-123
2
  Consumer: invoice-service
3
  Consumer: notification-service
4
  Consumer: analytics-service

Each service durable tracks its own delivery state. That means invoice-service can fall behind without changing the position of notification-service.

If a service has multiple replicas, those replicas can share the same durable consumer so the service can distribute work across replicas. The exact mechanics depend on the consumer type: multiple replicas can bind to and fetch from a shared pull consumer, while a push consumer needs a deliver group (a queue group on the consumer’s delivery subject) for replicas to share its messages. Either way, the principle is the same: replicas of the same service share that service’s durable consumer state.

Be careful: load balancing and strict FIFO are in tension

If you need strict processing order for a given service and tenant, do not allow multiple messages from that tenant/service consumer to be processed concurrently unless your application can tolerate out-of-order completion.

JetStream can deliver messages in stream order, but once multiple service replicas are handling messages concurrently, message 2 may finish before message 1. Redeliveries after timeouts can also affect the apparent order in which handlers see work.

For strict per-tenant FIFO, consider these constraints:

Use a separate stream per tenant.
Use a durable consumer per service per tenant.
Limit the number of in-flight messages for that consumer when strict ordering matters — for example, set the consumer’s max ack pending to 1, so the next message is not delivered until the current one is acknowledged.
Make handlers idempotent, because retries and redeliveries are normal parts of at-least-once processing.

That approach intentionally gives up intra-tenant parallelism. The scalability comes from processing many tenants in parallel, not many events for the same tenant in parallel.

If your application only requires ordered delivery but not ordered completion, you can allow more concurrency per consumer. That is a throughput/semantics tradeoff and should be made explicitly.

Will 100 to 1,000 tenants scale?

For a design with 100 to 1,000 tenants, a stream per tenant is a plausible starting point. The total object count matters:

1
number of streams = tenants
2
number of consumers = tenants × services that need durable consumption

So, for 1,000 tenants and 5 services, you may be operating around:

1
1,000 streams
2
5,000 durable consumers

That is a materially different design from 1 stream and 5 consumers, but it can be reasonable when the objects are stable and the event rate is modest. Avoid assuming the same approach will comfortably extend to tens of thousands of tenants without additional design work, benchmarking, and operational planning.

There is no single universal limit because capacity depends on factors such as:

message rate per tenant
message size
retention policy
storage backend and I/O
replication settings
number of consumers
ack behavior
redelivery behavior
cluster sizing
how quickly streams and consumers are created or deleted

The practical recommendation is to load-test the shape you intend to run: realistic tenant count, service count, event rate, message size, retention, and failure scenarios.

Avoid high-churn stream and consumer management

A tenant-per-stream model is most comfortable when tenants are long-lived. It is less attractive if your application constantly creates and deletes tenant streams at a high rate.

If tenants are stable SaaS customers, this is usually manageable. If tenants are short-lived sessions, jobs, devices, or temporary workloads, consider whether the tenant is really the correct stream boundary.

Plan account boundaries before adding leaf nodes

Even if authentication and authorization are simple today, it is worth designing for stronger isolation before tenant-connected leaf nodes are introduced.

A few practical considerations:

NATS supports multiple accounts either defined statically in the server configuration or, in operator mode, issued as signed JWTs and managed with the nsc tool. Static accounts are simpler for a small, fixed set of tenants; operator mode with decentralized JWT authentication scales better when tenants or leaf nodes need to manage their own users and credentials. If you expect the latter, adopting operator mode early avoids a disruptive migration.
Avoid baking in assumptions that all tenants must forever share one NATS account.
Design subjects so that a tenant can be isolated into its own account later if needed.
If shared services need to cross account boundaries, plan for account imports and exports.

One account per tenant is not always necessary at the start. It adds operational complexity, especially if shared services need access to tenant-scoped messages. But designing so that tenant isolation can be increased later gives you more deployment flexibility.

This is especially relevant for leaf nodes running in tenant environments. A tenant leaf node that publishes only under that tenant’s subject prefix is much easier to secure and reason about if the subject hierarchy and account model were designed with isolation in mind.

Recommended starting design

For a multi-tenant SaaS backend with FIFO per tenant and cross-tenant parallelism, a practical starting design is:

Use a stable application token as the first subject token.
Include the tenant ID later in the subject hierarchy.
Create one JetStream stream per tenant when tenant lifetimes are stable.
Create one durable consumer per service per tenant stream.
Let replicas of the same service share that service’s durable consumer.
If strict FIFO processing is required, limit outstanding work per tenant/service consumer.
Load-test the total stream, consumer, and event-rate profile before committing to production sizing.
Plan now for future account isolation, especially if leaf nodes may connect from tenant networks.

Bottom line

A stream per tenant is a reasonable JetStream pattern when the business requirement is independent FIFO processing per tenant and parallelism across tenants. It keeps tenant backlogs isolated and avoids some of the scaling concerns of a single stream with many sparse filtered consumers.

The key tradeoff is concurrency: strict FIFO requires limiting per-tenant processing concurrency, while higher throughput within a tenant can weaken ordered-completion guarantees. Keep stream and consumer counts stable, test with realistic traffic, and design the subject and account model so stronger tenant isolation remains possible later.

Want help from the NATS experts? Meet with our architects to get help tailored to your use case and environment.