MQTT Keep-Alive is how your broker knows your clients are still alive and kicking, even when they’re not actively sending data.

Here’s a basic MQTT setup:

# Publisher (client)
import paho.mqtt.client as mqtt
import time

broker_address = "your_broker_ip"
port = 1883
client_id = "publisher_client"

def on_connect(client, userdata, flags, rc):
    if rc == 0:
        print("Connected to MQTT Broker!")
    else:
        print(f"Failed to connect, return code {rc}\n")

def on_publish(client, userdata, mid):
    print(f"Message Published with MID: {mid}")

client = mqtt.Client(client_id=client_id)
client.on_connect = on_connect
client.on_publish = on_publish

client.connect(broker_address, port)
client.loop_start() # Start network loop in background thread

for i in range(10):
    msg = f"hello-{i}"
    client.publish("test/topic", msg)
    time.sleep(2)

client.loop_stop() # Stop network loop
client.disconnect()
# Subscriber (client)
import paho.mqtt.client as mqtt

broker_address = "your_broker_ip"
port = 1883
client_id = "subscriber_client"

def on_connect(client, userdata, flags, rc):
    if rc == 0:
        print("Connected to MQTT Broker!")
        client.subscribe("test/topic")
    else:
        print(f"Failed to connect, return code {rc}\n")

def on_message(client, userdata, msg):
    print(f"Received message: {msg.payload.decode()} on topic {msg.topic}")

client = mqtt.Client(client_id=client_id)
client.on_connect = on_connect
client.on_message = on_message

client.connect(broker_address, port)

client.loop_forever()

When a client connects, it can specify a keepalive interval in seconds. This is the maximum period of time (in seconds) the client will allow itself to be without communication from the server. If the client does not hear from the server within this time, it will send a PINGREQ packet. The server, in turn, is expected to respond with a PINGRESP packet. If the client does not receive a PINGRESP within the keepalive interval, it will disconnect. Conversely, the broker also uses the keepalive interval to determine if a client has become unresponsive. If the broker doesn’t receive any PINGREQ packets or other messages from the client within the keepalive interval, it will assume the client is dead and disconnect it.

The primary problem this solves is detecting dead or lost clients gracefully. Without keep-alive, a broker might hold onto resources for a client that has crashed or lost its network connection indefinitely, or until a manual cleanup. The keep-alive mechanism ensures that the broker proactively disconnects such clients.

This is how you set the keepalive in paho-mqtt:

# Publisher with keepalive
import paho.mqtt.client as mqtt

broker_address = "your_broker_ip"
port = 1883
client_id = "publisher_with_keepalive"
keepalive_interval = 60 # seconds

client = mqtt.Client(client_id=client_id)
client.connect(broker_address, port, keepalive_interval)
# ... rest of your client code

On the broker side (e.g., Mosquitto), the keepalive_timeout setting in mosquitto.conf controls how long the broker waits for a PINGREQ or any other packet from a client before considering it disconnected. The default is typically 15 minutes (900 seconds).

# mosquitto.conf
# Default is 900 seconds (15 minutes)
keepalive_timeout 60

If you set keepalive_interval on the client to 60 seconds, and keepalive_timeout on the broker to 60 seconds, the client will send a PINGREQ every 60 seconds, and the broker will expect one within 60 seconds. This provides a quick detection loop. If the client misses sending a PINGREQ (e.g., due to a network hiccup), the broker will disconnect it within that 60-second window. If the broker is slow to respond to a PINGREQ, the client will disconnect.

A common pattern for detecting disconnected devices is to use the on_disconnect callback on the client side. This callback is invoked when the client is disconnected from the broker, whether intentionally or due to an error or keep-alive failure.

# Client with disconnect handling
import paho.mqtt.client as mqtt

broker_address = "your_broker_ip"
port = 1883
client_id = "disconnect_handler"
keepalive_interval = 30 # seconds

def on_connect(client, userdata, flags, rc):
    if rc == 0:
        print("Connected to MQTT Broker!")
    else:
        print(f"Failed to connect, return code {rc}\n")

def on_disconnect(client, userdata, rc):
    if rc != 0:
        print(f"Unexpected disconnection (return code {rc}). Reconnecting...")
        # Add reconnection logic here if needed
        # For example:
        # client.reconnect()
    else:
        print("Clean disconnection.")

client = mqtt.Client(client_id=client_id)
client.on_connect = on_connect
client.on_disconnect = on_disconnect

try:
    client.connect(broker_address, port, keepalive_interval)
    client.loop_forever()
except KeyboardInterrupt:
    print("Exiting.")
    client.disconnect()

The rc in on_disconnect indicates the reason for disconnection. 0 means a clean disconnect (e.g., client called disconnect()). Non-zero values indicate an abnormal disconnection, which is often where keep-alive failures manifest. For instance, rc=5 usually signifies the client is not authorized, but other non-zero values can indicate network issues or the broker timing out the client.

The broker’s logs are invaluable for debugging keep-alive issues. If a client is unexpectedly disconnected, check the broker’s logs for messages indicating a client timeout or disconnection due to inactivity. For Mosquitto, this would be in /var/log/mosquitto/mosquitto.log or similar, depending on your configuration. You’ll often see entries like: New connection from <client_ip> on port <port>, with PID <pid>. If the client disconnects due to keep-alive, you might see: Client <client_id> has been disconnected or Socket error on client <client_id>, disconnecting..

The most common mistake is setting the client’s keepalive_interval too high, especially on unreliable networks. If a client’s keepalive_interval is 300 seconds (5 minutes), and the network drops for 6 minutes, the client will only realize it’s disconnected after those 6 minutes. Meanwhile, the broker has been holding resources for a dead client. A shorter interval, like 30 or 60 seconds, provides much faster detection.

Another common pitfall is network infrastructure (firewalls, load balancers) that aggressively close idle TCP connections. These devices might close the connection before the MQTT keep-alive mechanism has a chance to detect it. In such cases, you need to configure these devices to allow the MQTT keep-alive packets to pass through or increase their idle connection timeout to be longer than the client’s keepalive_interval.

The interaction between client keepalive_interval and broker keepalive_timeout is critical. The client sends PINGREQs at its keepalive_interval. The broker expects any activity (PINGREQ or a PUBLISH/SUBSCRIBE/etc.) from the client within its keepalive_timeout. If the client’s keepalive_interval is, say, 60 seconds, and the broker’s keepalive_timeout is 120 seconds, the client will ping every minute, and the broker will consider the client alive as long as it receives something within 2 minutes. A tighter loop is achieved when these values are closer, but not identical. A common practice is to set the client’s keepalive_interval to half or two-thirds of the broker’s keepalive_timeout to account for network latency and ensure timely detection.

If you’re using a managed MQTT service, they often have their own default keep-alive timeouts, and you might only be able to configure the client-side interval. Always check the documentation for your specific broker implementation.

The next thing you’ll likely encounter is how to implement robust reconnection logic for clients that do disconnect unexpectedly, especially if you’re aiming for a highly available system.

Want structured learning?

Take the full Mqtt course →