Event-driven architectures are a lie. They don’t decouple services; they just move the coupling from direct API calls to shared event schemas.

Consider a simple e-commerce scenario: an OrderService publishes an OrderCreated event, and a NotificationService subscribes to it to send an email.

// OrderCreated Event
{
  "orderId": "123e4567-e89b-12d3-a456-426614174000",
  "customerId": "abc89012-f345-6789-0123-456789abcdef",
  "items": [
    {
      "productId": "xyz98765-a432-1098-7654-3210fedcba98",
      "quantity": 2,
      "price": 19.99
    }
  ],
  "timestamp": "2023-10-27T10:00:00Z"
}

The NotificationService needs to know about orderId, customerId, and items to construct the email. If OrderService changes the items structure (e.g., adds sku), NotificationService breaks. This is schema coupling. The "decoupling" is illusory; it’s just a different form of dependency.

The system you’re building is a Kafka cluster. Let’s say you have OrderService producing events to a orders topic and NotificationService consuming from it.

What problem does this solve? The promise is that services can react to events without knowing about each other directly. OrderService doesn’t need to know NotificationService exists. It just says, "Hey, an order was created!" and the event bus (Kafka) handles routing it to anyone interested. This allows OrderService to evolve independently of NotificationService, as long as the event contract is maintained. It also allows new services to easily subscribe to existing events without modifying the producer.

How does it work internally? Kafka is a distributed commit log. Producers write messages (events) to topics, which are partitioned for scalability and fault tolerance. Consumers read messages from partitions, maintaining their own offset (their position in the log). When OrderService publishes an OrderCreated event, it’s appended to a partition of the orders topic. NotificationService (as a consumer group) polls Kafka for new messages on that topic. When it receives the OrderCreated event, it processes it, updates its offset, and moves on. If NotificationService restarts, it can resume from its last committed offset, ensuring no events are missed.

The exact levers you control:

  • Topic Partitioning: The number of partitions for a topic dictates parallelism. More partitions mean more consumers can read concurrently. If your orders topic has 4 partitions, you can have up to 4 consumers in the order-notifications consumer group processing events in parallel. kafka-topics.sh --alter --topic orders --partitions 8
  • Replication Factor: Determines how many copies of each partition are stored across brokers. A replication factor of 3 means each partition exists on 3 different Kafka brokers, providing high availability. If one broker fails, others still have the data. kafka-topics.sh --alter --topic orders --replication-factor 3
  • Consumer Group: A logical group of consumers that share a topic. Kafka ensures that each partition is consumed by only one consumer within a given consumer group. This is how you scale consumption. If you have 4 partitions and 2 consumers in the order-notifications group, each consumer will handle 2 partitions. If you add a third consumer, one consumer will now handle 2 partitions, and the other two will handle 1 each.
  • Message Key: When producing an event, providing a key (e.g., orderId) ensures all messages for that key go to the same partition. This is crucial for ordered processing of events related to a specific entity. If you produce OrderCreated events with orderId as the key, all events for orderId "123e4567-e89b-12d3-a456-426614174000" will land in the same partition, allowing a single consumer to process them in order.
  • enable.auto.commit: In Kafka consumers, this setting controls whether offsets are committed automatically. enable.auto.commit=true (default) means offsets are committed periodically in the background. If a consumer crashes after processing an event but before the auto-commit, that event might be reprocessed upon restart (at-least-once delivery). Setting enable.auto.commit=false requires explicit offset commits after successful processing, enabling exactly-once semantics (though often more complex to achieve end-to-end).
  • isolation.level: For consumers reading from Kafka, this setting (read_uncommitted vs. read_committed) controls whether they see transactional writes. read_committed ensures consumers only see messages from committed transactions, preventing them from seeing "dirty" data during multi-message transactions.

The "decoupling" provided by event-driven systems isn’t about removing dependencies; it’s about transforming them into shared, evolving contracts. The true challenge lies in managing the schema evolution of these shared contracts and ensuring consumers can gracefully handle changes. The real trick is realizing that schema registry, like Confluent Schema Registry with Avro or Protobuf, becomes the de facto central authority, not Kafka itself. If your producer and consumer don’t agree on the schema version, nothing works, and the whole "decoupling" falls apart.

The next problem you’ll hit is handling poison pills: messages that a consumer repeatedly fails to process, blocking the entire consumer group.

Want structured learning?

Take the full Microservices course →