Microservices choreography often feels like a jazz improvisation, while orchestration is more like a meticulously rehearsed symphony.
Let’s see choreography in action. Imagine an e-commerce order placement.
Choreography Example:
- Customer Service: Receives the order request.
- Order Service: Creates an order record, publishes an
OrderCreatedevent. - Inventory Service: Subscribes to
OrderCreated, decrements stock. If successful, publishesInventoryReserved. - Payment Service: Subscribes to
OrderCreated(orInventoryReserved), processes payment. If successful, publishesPaymentProcessed. - Shipping Service: Subscribes to
PaymentProcessed, creates a shipping label. PublishesOrderShipped. - Notification Service: Subscribes to
OrderShipped, sends confirmation email.
Each service reacts to events published by others, independently deciding its next action. There’s no central brain telling them what to do.
Orchestration Example:
- Customer Service: Receives the order request.
- Order Orchestrator: Receives the request and takes control.
- Order Orchestrator: Calls Inventory Service to reserve stock.
- Order Orchestrator: If inventory is reserved, calls Payment Service to process payment.
- Order Orchestrator: If payment is processed, calls Shipping Service to create label.
- Order Orchestrator: If shipping is created, calls Notification Service to send confirmation.
Here, a single orchestrator service directs the flow, making explicit calls to other services.
The Core Problem They Solve
Both patterns address the challenge of coordinating multiple independent microservices to complete a business process. Without a clear strategy, these processes become brittle, difficult to debug, and prone to failures where one service completes its part but subsequent steps never happen.
How They Work Internally
Choreography relies on an event bus or message broker. Services publish events (e.g., OrderCreated, PaymentFailed) to topics or queues. Other services subscribe to these events and trigger their own logic based on what they receive. This creates a decentralized, reactive system.
Orchestration uses a central coordinating service. This orchestrator service knows the entire workflow. It uses direct synchronous calls (e.g., REST, gRPC) or asynchronous messages to tell other services what to do, often in a predefined sequence. It’s responsible for managing the state of the overall process.
The Levers You Control
In Choreography:
- Event Design: What information is included in each event? This dictates what downstream services can react to.
- Subscription Logic: Which events does each service listen to? This determines its participation in workflows.
- Idempotency: Ensuring that processing the same event multiple times has no unintended side effects, crucial for reliability.
- Event Broker Configuration: Choosing between Kafka, RabbitMQ, SQS, etc., and configuring topics/queues for optimal throughput and delivery guarantees.
In Orchestration:
- Workflow Definition: How is the sequence of service calls defined? This could be through code, a state machine library (like AWS Step Functions, Temporal, Cadence), or a DSL.
- Error Handling and Retries: How does the orchestrator handle failures from called services? Does it retry, compensate, or fail the entire process?
- State Management: Where does the orchestrator store the current state of the business process?
- Service Contracts: The specific APIs and data formats the orchestrator expects from the services it calls.
The Counterintuitive Truth About Failures
While orchestrators appear to offer better visibility into a failing process because there’s a single point of control, they introduce a single point of failure for the coordination itself. If the orchestrator service goes down, the entire business process grinds to a halt. In choreography, a failure in one service doesn’t necessarily stop others from processing their part of the workflow or reacting to other events, leading to a more resilient system overall, though debugging distributed event flows can be significantly harder.
The next conceptual hurdle is understanding how to implement reliable compensation mechanisms when using choreography to undo actions if a downstream service fails.