Scalability vs. Ordering

Published on
Rohan Dcunha-
2 min read

Overview

Background: Monitoring Solution

We develop a monitoring solution for an Enterprise application. While the solution has multiple components like the Backend service, Grafana, Thanos, Prometheus etc, the part that I’m bringing forward in this blog is related to the Backend service.

Dilemma: Off the shelf or not

As our monolith backend grew, horizontal scaling became critical. However, our application, which processes alert notifications from sources like Grafana, had a strict requirement: Alerts must be processed in the exact order received. Processing an older 'Resolved' alert after a newer 'Firing' alert would lead to a critically incorrect final status. To scale while maintaining strict ordering guarantees, we faced a crucial architectural choice:

  1. Distributed Queuing (Consistent Hash Exchange): Use a complex mechanism like a Consistent Hash Exchange in RabbitMQ (RMQ) to route all related messages to the same consumer instance.
  2. Application-Layer Versioning: Implement a versioning mechanism within our application logic to enforce order across multiple consumers.

Choice and Trade-off

The Consistent Hash Exchange was quickly dismissed. While effective, it introduced an operational dependency — a dedicated monitoring service would be required to dynamically manage queue bindings as consumer instances scaled up or down. This added complexity and maintenance burden. We chose the Application-Layer Versioning. The publisher now tags messages with an incremental version. If a consumer receives a message whose previous version is not yet processed, the message is simply requeued. This solution was simpler to develop, kept the scaling logic self-contained within the application layer, and allowed us to achieve reliable horizontal scaling without compromising our essential ordering guarantee. The choice also aligns without future roadmap of cloud native architecture with AWS.