The Saga pattern is often presented as a simpler, more performant alternative to Two-Phase Commit (2PC) for distributed transactions, but its true power lies in its ability to embrace eventual consistency and handle failures gracefully without blocking resources.
Let’s see how this plays out in a common e-commerce scenario: an order placement.
Imagine a simple order service and a payment service.
# Order Service Configuration (Simplified)
spring:
application:
name: order-service
kafka:
bootstrap-servers: kafka-broker-1:9092,kafka-broker-2:9092
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
consumer:
group-id: order-consumer-group
auto-offset-reset: earliest
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.springframework.kafka.support.deserializer.JsonDeserializer
properties:
spring.json.trusted.packages: "com.example.events"
# Payment Service Configuration (Simplified)
spring:
application:
name: payment-service
kafka:
bootstrap-servers: kafka-broker-1:9092,kafka-broker-2:9092
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
consumer:
group-id: payment-consumer-group
auto-offset-reset: earliest
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.springframework.kafka.support.deserializer.JsonDeserializer
properties:
spring.json.trusted.packages: "com.example.events"
When an order is placed, the order-service might first create an order in a pending state and then publish an OrderCreatedEvent to Kafka.
// OrderService.java
@Service
public class OrderService {
@Autowired
private KafkaTemplate<String, Object> kafkaTemplate;
private static final String ORDER_CREATED_TOPIC = "order-created-topic";
public Order placeOrder(Order order) {
// 1. Create order in pending state
order.setStatus("PENDING");
// ... save order to database ...
// 2. Publish OrderCreatedEvent
kafkaTemplate.send(ORDER_CREATED_TOPIC, order.getId(), new OrderCreatedEvent(order.getId(), order.getAmount()));
return order;
}
}
The payment-service subscribes to this event. Upon receiving it, it attempts to process the payment and then publishes a PaymentProcessedEvent or PaymentFailedEvent.
// PaymentService.java
@Service
public class PaymentService {
@Autowired
private KafkaTemplate<String, Object> kafkaTemplate;
private static final String PAYMENT_EVENT_TOPIC = "payment-event-topic";
@KafkaListener(topics = "order-created-topic", groupId = "payment-consumer-group")
public void handleOrderCreated(OrderCreatedEvent event) {
// 1. Attempt to process payment
boolean success = processPayment(event.getOrderId(), event.getAmount());
if (success) {
// 2. Publish PaymentProcessedEvent
kafkaTemplate.send(PAYMENT_EVENT_TOPIC, event.getOrderId(), new PaymentProcessedEvent(event.getOrderId()));
} else {
// 3. Publish PaymentFailedEvent
kafkaTemplate.send(PAYMENT_EVENT_TOPIC, event.getOrderId(), new PaymentFailedEvent(event.getOrderId(), "Insufficient funds"));
}
}
private boolean processPayment(String orderId, BigDecimal amount) {
// ... actual payment processing logic ...
System.out.println("Processing payment for order: " + orderId + " amount: " + amount);
return true; // Simulate success
}
}
The order-service then listens for these payment events. If PaymentProcessedEvent is received, it marks the order as APPROVED. If PaymentFailedEvent is received, it must compensate for the initial order creation by initiating a rollback process, perhaps by publishing an OrderCancellationCommand.
// OrderService.java (continued)
@Service
public class OrderService {
// ... existing code ...
@KafkaListener(topics = "payment-event-topic", groupId = "order-consumer-group")
public void handlePaymentEvent(Object paymentEvent) {
if (paymentEvent instanceof PaymentProcessedEvent) {
PaymentProcessedEvent event = (PaymentProcessedEvent) paymentEvent;
// 1. Mark order as approved
updateOrderStatus(event.getOrderId(), "APPROVED");
} else if (paymentEvent instanceof PaymentFailedEvent) {
PaymentFailedEvent event = (PaymentFailedEvent) paymentEvent;
// 2. Initiate compensation (rollback)
initiateOrderRollback(event.getOrderId(), event.getReason());
}
}
private void updateOrderStatus(String orderId, String status) {
// ... find order by ID and update status ...
System.out.println("Order " + orderId + " status updated to: " + status);
}
private void initiateOrderRollback(String orderId, String reason) {
System.out.println("Payment failed for order " + orderId + ". Reason: " + reason + ". Initiating rollback.");
// Publish an event or command to cancel the order or refund if payment was partially captured
// For simplicity, let's assume we just mark it as FAILED and potentially clean up
updateOrderStatus(orderId, "FAILED");
// In a real scenario, you might publish an OrderCancellationCommand
// kafkaTemplate.send("order-command-topic", orderId, new OrderCancellationCommand(orderId));
}
}
This chain of events and corresponding compensating actions forms a Saga. If the payment fails, the order-service doesn’t try to "undo" the Kafka message; it executes a separate, defined compensation logic (e.g., marking the order as failed, potentially triggering a refund if a payment was already debited). The key is that each step in the Saga is idempotent, and compensation actions are also designed to be idempotent and reversible.
The most surprising thing about Sagas is that they don’t guarantee atomicity in the traditional ACID sense; instead, they guarantee semantic atomicity through a series of local transactions and compensating actions, allowing systems to remain available even if individual services fail temporarily.
To manage this, you need a way to track the state of each Saga instance. This is often done with a state machine, either within each service or in a dedicated orchestrator. When the order-service receives the PaymentFailedEvent, its state machine transitions from "Awaiting Payment Confirmation" to "Rollback Initiated," triggering the initiateOrderRollback method.
The exact levers you control are the events published, the commands received, and the compensation logic implemented for each step. For instance, if the payment service successfully debits money but then fails to publish the PaymentProcessedEvent (due to a Kafka outage), the order service will eventually time out waiting for a payment confirmation. In this case, the order service would need its own compensation logic to trigger a refund from the payment service, highlighting the need for robust error handling and potential outbox patterns to ensure event delivery.
The crucial insight often missed is that compensation actions themselves can fail. If the order service attempts to mark an order as FAILED but the database is unavailable, the Saga is now in an inconsistent state. This is why compensation logic often needs to be retried, or even trigger a human intervention process, turning a failed transaction into an operational alert.
The next concept you’ll grapple with is how to handle complex Sagas involving multiple services, and the trade-offs between choreography (event-driven, decentralized) and orchestration (centralized command and control).