This is the first part of a two-part series explaining how Vitrifi, an innovative broadband operating system for telcos, leverages NATS.io to build its Workflow Automation Platform.
At Vitrifi, we embarked on building a Workflow Automation Platform after discovering that existing solutions couldn’t meet our specific requirements. Our vision was clear - we needed a platform that would:
When existing open-source workflow solutions fell short, we took action. This led to the creation of Shar, an open-source workflow engine developed by Crystal Construct with Vitrifi’s support. Shar became the foundation for our Workflow Automation Platform.
As we built out the UI, backend APIs, and additional services, one common architecture element emerged useful across all components: NATS.
Our architecture is broken down into two primary sections: Content Management and Core.
This acts as the control center of the platform: users design tasks, create workflows, manage application settings, handle user administration, and monitor platform performance.
This is where workflows come to life. Three key components work together to execute tasks:
When users publish a workflow from the CMS, it transforms the workflow and its associated tasks into immutable objects and persists them to the Core using NATS as a key-value store. Additionally, the UI and API continuously receive real-time messages from NATS about workflow status updates.
NATS serves as our universal transport layer for all workflow automation tasks, but its capabilities extend far beyond basic message passing. For us, it’s a Swiss Army knife of distributed systems architecture, enabling everything from state management to cross-system integration.
Consider a likely real-world scenario where the Workflow Automation Platform is deployed in a multi-cloud pattern:
If the cloud deployment experiences issues, the private datacenter continues operations seamlessly. This is possible because the platform is fundamentally asynchronous, using events (messages) processed through NATS. The ability to extend NATS systems across deployments combined with the Core components’ sole use of NATS as the communication layer creates a natural resilience mechanism.
When we talk about NATS as a database, we’re really talking about its Key-Value store capabilities. What makes it particularly powerful for our use case is its ability to handle distributed state with grace. The store provides strong consistency guarantees within clusters, while managing eventual consistency across our distributed deployments.
What really shines is the built-in revision tracking and real-time state monitoring. Imagine being able to track every state change in your workflows while maintaining the ability to replay history when needed. Add to that automatic key expiration and atomic operations, and you’ve got a robust foundation for state management.
We particularly value:
Our platform processes thousands of workflow states per minute, each generating multiple events that need reliable delivery and processing. This is where NATS’ JetStream capabilities really shine.
The beauty of JetStream lies in its flexibility. We can fine-tune everything from message retention to delivery patterns with integrated backpressure handling.
The ability of consuming messages in a push/pull consumer mode has proven to be a great addition to our toolset. For example, for large volumes of messages, we choose a pull pattern which allows for lots of control. Workflow state updates are a great example of this.
Push mode is used for discrete messages where creating the pull consumer logic is a bit of an overshoot and immediate action is required – think of pausing or cancelling a workflow execution due to an irrecoverable failure.
While implementing idempotent operations is still a best practice (and we do), NATS provides robust exactly-once delivery semantics that make our lives much easier.
This is particularly crucial for our Trigger Server component. When a workflow trigger arrives, we absolutely need to ensure it executes once – no more, no less. NATS handles this elegantly through a combination of message de-duplication, delivery tracking and acknowledgement mechanisms.
In our design, we also use NATS as a central point for integration with other Vitrifi systems.
Through features like leaf nodes and gateway connections, we can create sophisticated multi-region deployments while maintaining simplicity in our architecture.
Whilst our workflow automation platform is standalone, it is also part of Vitrifi’s wider portfolio of products. When integration with other systems is required, such as receiving messages from another NATS deployment or cluster, we leverage Leaf Nodes to facilitate seamless connectivity.
Due to its distributed nature and the features we’ve discussed, NATS is an excellent choice for a centralized but resilient backing store. We make extensive use of its request-reply pattern, which is elegantly simple: you send a request, and you receive a reply - all whilst preserving the loose coupling that underpins robust distributed systems.
For example, before running a workflow or task, we may need to validate certain conditions. Using the request-reply pattern for this purpose is straightforward and feels completely natural.
As a SaaS platform, multi-tenancy is essential for our Workflow Automation Platform. NATS provides a robust security model that ensures true multi-tenant isolation without compromising performance or flexibility. Through the use of NATS accounts for tenant isolation and JWT-based authentication, we can guarantee that each tenant’s data and processing remain entirely separate while sharing the same infrastructure.
Harnessing the power of NATS goes beyond its individual features - it’s about how these capabilities integrate seamlessly. This synergy enables us to build a workflow automation platform that is both powerful and reliable while remaining surprisingly simple to operate and maintain.
NATS has become more than just a component of our Workflow Automation Platform - it’s a fundamental piece of our architecture that enables extensibility, performance, flexibility, and resilience. As NATS continues to evolve, we’re excited to leverage more of its capabilities and push the boundaries of what’s possible in workflow automation.
The success of our platform’s architecture demonstrates that the best solution isn’t always about adding more components - it’s about choosing the right ones and using them to their full potential. For us, NATS has undoubtedly been that choice.
In the second part of this series, we’ll take a deep dive into Shar and explore how it leverages NATS’ powerful features to create a one-of-a-kind workflow engine.
Juan P. Genovese “JP” is Head of Core Software at Vitrifi with over 25 years of experience delivering exceptional software solutions across industries such as banking, telecoms, pharmaceuticals, and government. A specialist in cloud-native computing, DevOps, and solutions architecture, he is known for solving complex challenges and building high-performing teams.
He is passionate about mastering new technologies, mentoring talent, and driving innovation to deliver impactful results.
News and content from across the community