MQTT and Kafka are both popular messaging technologies, but they serve different purposes and excel in different scenarios. When you need to bridge IoT devices to a robust event streaming platform like Kafka, understanding their interplay is key.

Imagine you have thousands of temperature sensors in a sprawling factory. Each sensor is a tiny, resource-constrained device that needs to send a small reading every few seconds. On the other side, you have a sophisticated analytics engine that needs to process millions of these readings per second, detect anomalies, and trigger alerts. This is where MQTT and Kafka shine, and how they work together.

Here’s a simplified view of a sensor sending data:

{
  "device_id": "sensor-12345",
  "timestamp": "2023-10-27T10:30:00Z",
  "temperature": 22.5,
  "unit": "Celsius"
}

The sensor publishes this JSON payload to an MQTT topic, say factory/zoneA/sensor-12345/reading.

Now, let’s see how this data flows into Kafka.

The Core Problem: Bridging Protocols and Scale

The fundamental challenge is that IoT devices often speak MQTT due to its lightweight nature and publish-subscribe model, which is perfect for low-bandwidth, unreliable networks. Kafka, on the other hand, is built for high-throughput, persistent event streaming, often used for backend data pipelines and microservices. You can’t directly connect a raw MQTT client to Kafka and expect it to work. You need a bridge.

How the Bridge Works: MQTT Broker + Kafka Connect

The most common and robust way to achieve this is by using an MQTT broker (like Mosquitto, EMQX, or HiveMQ) in conjunction with Kafka Connect, a framework for reliably streaming data into and out of Kafka.

  1. MQTT Broker: This is the central hub for your IoT devices. Devices connect to the broker and publish their messages to specific topics.
  2. Kafka Connect MQTT Source Connector: This is the magic piece. A Kafka Connect connector is configured to subscribe to specific topics on the MQTT broker. When a message arrives on one of these subscribed topics, the connector reads it and publishes it as a record to a designated Kafka topic.

Here’s a conceptual look at the configuration for an MQTT source connector in Kafka Connect. This is a snippet from a connector.properties file or a JSON payload for the Connect REST API.

name=mqtt-source-connector
connector.class=io.confluent.connect.mqtt.MqttSourceConnector
tasks.max=1

# MQTT Broker Connection Details
mqtt.broker.connection=tcp://mqtt.example.com:1883
mqtt.username=iot_user
mqtt.password=supersecret

# Topics to subscribe to. Can use wildcards.
mqtt.topics=factory/+/+/reading

# Kafka Connection Details
kafka.bootstrap.servers=kafka.example.com:9092
kafka.topic.regex=.* # Or a specific mapping

# Data transformation (optional, but common)
# value.converter=org.apache.kafka.connect.json.JsonConverter
# key.converter=org.apache.kafka.connect.json.JsonConverter
# key.converter.schemas.enable=false
# value.converter.schemas.enable=false

When the connector receives a message on factory/zoneA/sensor-12345/reading, it might transform it and then publish it to a Kafka topic named iot-sensor-readings. The payload in Kafka would look very similar, but now it’s part of Kafka’s distributed log.

The Mental Model: From Pub/Sub to Log

Think of MQTT as a highly efficient, ephemeral pub/sub system for edge devices. Messages are delivered to subscribers, but they aren’t typically stored long-term by the broker itself. Kafka, on the other hand, is a durable, immutable log. Once a message is written to a Kafka topic, it’s there until its retention period expires. This durability is crucial for analytics, replayability, and fault tolerance.

The bridge acts as a translation layer. It takes the "fire and forget" nature of MQTT messages and makes them persistent, ordered, and replayable within Kafka. This allows your downstream systems (analytics engines, databases, other microservices) to consume these IoT events reliably, even if they go offline temporarily.

The primary levers you control are:

  • MQTT Topic Subscription: What data from the MQTT broker do you want to ingest into Kafka? This is defined by mqtt.topics in the connector. Using wildcards (+ for a single level, # for multiple levels) is powerful here.
  • Kafka Topic Mapping: Where do the ingested messages go in Kafka? This can be a direct mapping or a dynamic regex-based transformation.
  • Data Format and Transformation: How is the data serialized? Is it JSON, Avro, Protobuf? The connectors handle this. You can also perform transformations (e.g., adding metadata, filtering) within Kafka Connect using Single Message Transforms (SMTs).
  • MQTT Broker Connection: Credentials, host, port, and TLS settings for connecting to your MQTT broker.
  • Kafka Broker Connection: Bootstrap servers and security settings for your Kafka cluster.

The One Detail Most People Miss

Many assume the MQTT source connector simply dumps raw MQTT payloads into Kafka. In reality, the connector often acts as a stateful consumer of the MQTT broker’s message queue. It tracks which messages it has successfully delivered to Kafka. If the connector restarts, it doesn’t re-read messages from the MQTT broker from the beginning unless specifically configured to do so (which is rare and usually undesirable for high-volume streams). This means the guarantee of delivery to Kafka is tied to the connector’s internal state and Kafka’s commit mechanism. If the connector fails after reading a message from MQTT but before successfully writing it to Kafka and committing the offset, that message could be lost. Robust connectors and proper Kafka Connect worker configuration (e.g., ensuring distributed mode and reliable storage for offsets) are critical for minimizing this risk.

This setup allows you to leverage the simplicity and efficiency of MQTT for device communication and the power, scalability, and durability of Kafka for backend processing and analytics.

The next logical step is often to explore how to send data from Kafka back to IoT devices using an MQTT sink connector.

Want structured learning?

Take the full Mqtt course →