MQTT Production Architecture: Scale to High Throughput (2026)

MQTT brokers can handle millions of concurrent connections, but a production system needs careful architecture to achieve high throughput.

Here’s a look at an MQTT production architecture designed for high throughput, using EMQX as an example:

+---------------------+       +---------------------+       +---------------------+
|     Load Balancer   | ----> |   EMQX Cluster Node 1 | ----> |    Database/Cache   |
+---------------------+       +---------------------+       +---------------------+
                                       ^
                                       |
+---------------------+       +---------------------+
|     Load Balancer   | ----> |   EMQX Cluster Node 2 | ----> |    Database/Cache   |
+---------------------+       +---------------------+       +---------------------+
                                       ^
                                       |
+---------------------+       +---------------------+
|     Load Balancer   | ----> |   EMQX Cluster Node 3 | ----> |    Database/Cache   |
+---------------------+       +---------------------+       +---------------------+

The Problem: A single MQTT broker can become a bottleneck under heavy load. As the number of concurrent connections, messages published, and messages delivered increases, the broker’s CPU, memory, and network I/O can be exhausted. This leads to dropped connections, delayed messages, and an overall unreliable system.

How it Works Internally:

Distributed Architecture: High throughput is achieved by distributing the load across multiple EMQX nodes. Each node is a full-fledged broker, capable of handling a subset of the total connections and message traffic.
Clustering: EMQX nodes form a cluster. This allows them to share connection state, topic subscriptions, and route messages efficiently. When a client connects to any node, that node becomes its "home" node. If a message is published to a topic that other clients subscribed to on different nodes, the home node routes the message to the appropriate node in the cluster. This inter-node communication is crucial for scalability.
Load Balancing: External load balancers (like HAProxy, Nginx, or cloud provider LBs) distribute incoming client connections across the EMQX cluster nodes. This ensures no single node is overwhelmed by new connections.
Data Persistence & State Management: While EMQX itself is primarily an in-memory broker for speed, production systems often integrate with external databases (like PostgreSQL, Cassandra, Redis) for:
- Session Persistence: Storing client session state (subscriptions, QoS levels) so clients can reconnect seamlessly.
- Message Persistence (for QoS 1 & 2): Storing messages that need to be delivered reliably.
- Authentication/Authorization: Storing user credentials and access control lists.
Publish/Subscribe Routing:
- When a client publishes a message, its home node receives it.
- EMQX checks its internal subscription tables to see which clients (potentially on other nodes) are subscribed to this topic.
- The message is forwarded to the relevant nodes for delivery.
- If persistence is enabled for the message, it’s also sent to the backend database.

Levers You Control:

Number of Nodes: More nodes mean more capacity. Scale horizontally by adding more EMQX instances.
Node Resources: The CPU, RAM, and network bandwidth of each EMQX node directly impact its individual capacity. Larger instances can handle more connections and higher message rates.
Load Balancer Configuration: How you distribute connections (e.g., round-robin, least connections) and handle health checks is critical.
Backend Database Performance: If your database becomes a bottleneck for session management or message persistence, it will limit your overall throughput.
MQTT QoS Levels: Higher QoS levels (QoS 1 and 2) require more acknowledgments and potentially persistence, increasing the load on the broker.
Topic Structure: Deeply nested topics can sometimes impact routing efficiency, though EMQX is generally very good at handling this.
Client Behavior: The number of clients, their connection patterns (e.g., bursty publishing), and their subscription patterns all influence system load.

The most surprising truth about scaling MQTT is that the inter-node communication within the cluster often becomes the limiting factor before individual node capacity is fully reached, especially for very high message volumes. This is because every message published to a topic with subscribers on other nodes requires network hops between EMQX instances. Optimizing this internal routing and ensuring high-speed, low-latency network connectivity between cluster nodes is paramount.

The next step in scaling is often optimizing the integration with your backend systems for authentication, authorization, and persistent storage.