The Kafka outbox pattern is surprisingly difficult to get right, despite its simple promise: ensuring that when a database transaction commits, a corresponding event is reliably published to Kafka.
Let’s see it in action. Imagine we have a users table and we want to publish a UserCreated event whenever a new user is inserted.
-- Our user table
CREATE TABLE users (
id UUID PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
-- Our outbox table
CREATE TABLE outbox_events (
id UUID PRIMARY KEY,
aggregate_type VARCHAR(50) NOT NULL,
aggregate_id UUID NOT NULL,
event_type VARCHAR(50) NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
published_at TIMESTAMP WITH TIME ZONE NULL
);
Now, when a user is created, we do two things within a single database transaction:
-- Start transaction
BEGIN;
-- Insert into users table
INSERT INTO users (id, email) VALUES ('a1b2c3d4-e5f6-7890-1234-567890abcdef', 'test@example.com');
-- Insert into outbox table
INSERT INTO outbox_events (id, aggregate_type, aggregate_id, event_type, payload)
VALUES (gen_random_uuid(), 'User', 'a1b2c3d4-e5f6-7890-1234-567890abcdef', 'UserCreated', '{"email": "test@example.com"}');
-- Commit transaction
COMMIT;
The magic happens with a separate process that monitors the outbox_events table. This process, often a Kafka Connect connector or a custom application, reads events where published_at is null, publishes them to Kafka, and then updates published_at to mark them as sent.
The core problem this pattern solves is the atomicity of database writes and message publishing. Without it, you might commit a user to the database but fail to publish the event (losing the event), or publish the event but fail to commit the user (resulting in an inconsistent state). The outbox pattern guarantees that both happen together, or neither does, by leveraging the database’s ACID properties. The database transaction ensures that the insertion into the users table and the outbox_events table are atomic. The separate publisher process then reliably picks up these "published but not yet sent" events.
The typical implementation involves a Change Data Capture (CDC) mechanism. Tools like Debezium are excellent for this. Debezium monitors the database’s transaction log (e.g., PostgreSQL’s Write-Ahead Log or WAL) and captures row-level changes. When a new row appears in outbox_events with published_at as NULL, Debezium sees this as an insert and can route it directly to Kafka. This decouples the database from directly interacting with Kafka, reducing complexity and potential failure points within the application transaction itself.
The CDC approach, specifically using Debezium, is often preferred because it doesn’t require polling the outbox_events table. Instead, it reads directly from the database’s transaction log, which is more efficient and provides near real-time delivery. Debezium can be configured to filter for specific tables and columns, and crucially, to only pick up rows that meet certain criteria, like published_at IS NULL. Once published to Kafka, Debezium can then trigger an update on the outbox_events table to set published_at, effectively marking the event as processed without the publisher needing direct database access.
The most surprising part is how the "published_at" update is often handled. Many assume the CDC connector directly updates the outbox_events table after successfully publishing to Kafka. In reality, a common and robust pattern is for the CDC connector to only read the new outbox events. The application that initially wrote to the outbox table is also responsible for updating the published_at timestamp after it receives an acknowledgement from Kafka that the message was successfully written. This ensures that the application itself knows the event was published before marking it as done in the database, preventing duplicate publishes if the CDC connector restarts.
The next hurdle you’ll encounter is managing event schemas and ensuring backward compatibility as your event payloads evolve.