Kafka and Debezium are a powerhouse for capturing database changes and streaming them into Kafka.

Here’s Debezium in action, watching a PostgreSQL table and publishing its changes to Kafka.

First, we need a PostgreSQL database with logical replication enabled. This is the core mechanism Debezium uses to tap into the database’s write-ahead log (WAL).

-- In postgresql.conf
wal_level = logical
max_replication_slots = 1
max_wal_senders = 1

Restart your PostgreSQL server for these changes to take effect.

Next, create a replication slot. This slot is how PostgreSQL keeps track of which WAL entries have been consumed by Debezium. If Debezium stops for a long time, the slot will prevent the WAL files from being deleted, which can fill up your disk.

-- In psql connected to your database
SELECT * FROM pg_create_logical_replication_slot('debezium_slot', 'pgoutput');

Now, let’s set up Debezium itself. We’ll use the Kafka Connect framework, which is designed to run connectors like Debezium. You need a running Kafka cluster and ZooKeeper (or KRaft).

Here’s a sample Debezium PostgreSQL connector configuration. This is a JSON payload you’d POST to your Kafka Connect REST API (usually http://localhost:8083/connectors).

{
  "name": "my-postgres-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "tasks.max": 1,
    "database.hostname": "localhost",
    "database.port": 5432,
    "database.user": "debezium_user",
    "database.password": "db_password",
    "database.dbname": "my_database",
    "database.server.name": "postgres-server-1",
    "plugin.name": "pgoutput",
    "replication.slot.name": "debezium_slot",
    "table.include.list": "public.users",
    "key.converter": "org.apache.kafka.connect.json.JsonConverter",
    "key.converter.schemas.enable": "false",
    "value.converter": "org.apache.kafka.connect.json.JsonConverter",
    "value.converter.schemas.enable": "false"
  }
}

The database.server.name is crucial: it prefixes all Kafka topics Debezium creates. So, changes for the public.users table will go to a topic named postgres-server-1.public.users.

Once the connector is running, any INSERT, UPDATE, or DELETE operation on the public.users table in your PostgreSQL database will be published as a message to the postgres-server-1.public.users Kafka topic.

Each message contains the before and after state of the row, along with metadata about the operation. For an INSERT or UPDATE, the after field will contain the new state. For a DELETE, the before field will hold the state of the row before deletion, and after will be null.

The plugin.name must match the output plugin you chose when creating the replication slot. pgoutput is the standard for newer PostgreSQL versions.

This setup allows you to build real-time data pipelines, audit trails, or synchronize data across different systems by simply consuming from Kafka topics.

What most people don’t realize is that Debezium doesn’t just send the new row data; it sends a structured event object. This object includes an op field indicating the type of change (c for create/insert, u for update, d for delete, r for snapshot/read), and crucially, the source field contains metadata about the original database event, like the LSN (Log Sequence Number) and timestamp, which is invaluable for reliable, ordered processing.

The next step is to consume these change events from Kafka and process them, perhaps using Kafka Streams or a custom application.

Want structured learning?

Take the full Kafka course →