MQTT QoS 2 achieves exactly-once message delivery by introducing a four-way handshake between publisher and subscriber, ensuring each message is received and processed precisely one time.
Let’s see this in action. Imagine a simple scenario: a temperature sensor publishing readings to a broker, and a dashboard subscribing to those readings.
// Publisher (client A) sending a temperature update
{
"topic": "sensors/temperature/livingroom",
"payload": "22.5",
"qos": 2,
"message_id": 12345
}
// Broker receiving the message
// Broker acknowledges receipt to client A (PubRec)
// Subscriber (client B) receiving the message
// Subscriber acknowledges receipt to broker (PubRel)
// Broker acknowledges successful delivery to subscriber (PubComp)
The core problem QoS 2 solves is preventing both message loss and duplicate messages in unreliable networks. Standard TCP provides reliable delivery, but it doesn’t guarantee that a message, once delivered to the application, won’t be re-sent by the sender if the sender thinks it wasn’t received. Similarly, if a subscriber receives a message but crashes before processing it, it might never have acknowledged it, leading to loss. QoS 2 explicitly addresses these edge cases.
Internally, QoS 2 uses a state machine and unique message_ids for each publish request. The four steps are:
- Publish (PUB): The publisher sends the message to the broker with
qos=2and a uniquemessage_id. This is the initial delivery attempt. - Publish Received (PubRec): The broker receives the message and, if it successfully stores it, sends a
PubRecback to the publisher. This confirms the broker has the message. If the publisher doesn’t get aPubRec, it knows it needs to re-send the message (though this is rare with QoS 2’s inherent reliability). - Publish Release (PubRel): The subscriber receives the message from the broker. After it has successfully processed the message (e.g., updated the dashboard), it sends a
PubRelback to the broker. ThisPubRelalso contains the samemessage_id. - Publish Complete (PubComp): The broker receives the
PubRel. Upon receiving this, the broker knows the subscriber has finished processing the message. The broker then sends aPubCompback to the subscriber, confirming the entire transaction is complete. The broker can now discard the message.
The beauty of this is that if a client crashes at any point, the other party has a way to recover. If the publisher crashes after sending the PUB but before receiving PubRec, it will re-send the PUB upon reconnect. The broker, having already processed it, will simply re-send the PubRec. If the subscriber crashes after receiving the message but before sending PubRel, upon reconnecting, it will ask the broker for any messages it hasn’t acknowledged. The broker will then re-send the message and expect a PubRel. If the broker crashes after sending the message to the subscriber but before receiving PubRel, the subscriber, upon reconnecting, will inform the broker that it received a message for a specific message_id and is waiting to send a PubRel.
The message_id is crucial here; it’s the unique identifier that allows both parties to track the state of a specific message exchange. Without it, the system wouldn’t know which message was being acknowledged.
Many implementations rely on the underlying TCP connection for reliability. However, TCP only guarantees that bytes arrive in order. It doesn’t guarantee that your application logic processed those bytes. QoS 2’s PubRel/PubComp handshake is the critical step that bridges this gap, proving to the broker that the subscriber’s application layer has completed its work with the message.
The next concept you’ll likely encounter is how to handle scenarios where a client might disconnect during one of these handshakes and how the broker manages lingering state for clients that might reconnect.