EMQX isn’t just a message broker; it’s a distributed system that scales horizontally by forming a cluster, and its core challenge is keeping that cluster in sync and performing under load.
Let’s see EMQX in action. Imagine we’re setting up a basic cluster for a fleet of IoT devices.
# On node 1
emqx start
# On node 2
EMQX_NODE_NAME=emqx@192.168.1.2 \
EMQX_DISCOVERY_STRATEGY=static \
EMQX_STATIC_DISCOVERY_NODES="emqx@192.168.1.1" \
emqx start
# On node 3
EMQX_NODE_NAME=emqx@192.168.1.3 \
EMQX_DISCOVERY_STRATEGY=static \
EMQX_STATIC_DISCOVERY_NODES="emqx@192.168.1.1,emqx@192.168.1.2" \
emqx start
This emqx start command, when run with specific environment variables, tells EMQX nodes how to find each other and form a cluster. EMQX_NODE_NAME is crucial; it’s the unique identifier for each node within the cluster. EMQX_DISCOVERY_STRATEGY=static means we’re manually telling each node which other nodes to look for, using EMQX_STATIC_DISCOVERY_NODES. Node 1, already running, is the initial point of contact. Node 2 is told to look for node 1. Node 3 is told to look for both node 1 and node 2. Once connected, they’ll share connection tables, topic subscriptions, and cluster state.
The problem EMQX solves is efficiently routing millions of concurrent messages between potentially millions of devices. It does this by distributing the load across multiple nodes. Each node can handle thousands of concurrent connections and publish/subscribe operations. When a message arrives, the responsible node routes it to all subscribed clients, whether they are on the same node or a different one in the cluster. This inter-node communication happens over the Erlang distribution protocol, which is highly optimized for fault tolerance and low latency.
Here’s a breakdown of the key configuration levers you control:
emqx.conf: This is your main configuration file. You’ll find settings for listener ports (MQTT, Web, etc.), authentication, authorization, persistence, and much more.listeners.tcp.default = 1883: This sets the default MQTT TCP listener port. You can add more, likelisteners.tcp.internal = 1884for internal cluster communication if needed.allow_anonymous = true: For quick testing, this allows clients to connect without credentials. Never use this in production.authentication.password_hash_salt: A critical security setting. This salt is used for hashing passwords stored in files or databases, making them more secure.
- Clustering: As shown above, static discovery is one way. For dynamic environments, you’d use strategies like
k8s(Kubernetes),dns, or cloud provider integrations (e.g.,aws_ec2).EMQX_DISCOVERY_STRATEGY=k8sEMQX_K8S_APP_NAME=emqx-broker(for Kubernetes discovery)
- Persistence: EMQX can store messages for offline clients or for durable subscriptions.
persistence.shared_client_session = truepersistence.shared_client_state = truepersistence.wal_sync_interval = 1000(synchronize write-ahead log every 1000ms)
- Resource Limits: Controlling how much memory, CPU, and disk EMQX can use is vital for stability.
mnesia.max_fragment_size = 1024(maximum size of Mnesia table fragments, affects memory usage)vm.argsfile: You can tune Erlang VM settings here, like+P 1048576(maximum number of processes).
The most surprising thing about EMQX’s clustering is how it manages data consistency across nodes. It uses a gossip protocol for node discovery and state synchronization, and a form of eventual consistency for things like shared subscriptions. This means that while the cluster aims for a consistent state, there might be very brief moments where different nodes have slightly different views of the cluster topology or subscription information. EMQX’s internal mechanisms are designed to resolve these discrepancies quickly and efficiently, making it incredibly resilient.
The shared_subscription feature in EMQX allows multiple clients to subscribe to the same topic pattern (e.g., $share/group1/topic/+/data) and have messages distributed among them. This isn’t about replicating messages; it’s about load balancing message delivery for a specific group of consumers. The magic is that only one client in the shared subscription group receives a particular message, even if multiple clients are subscribed to that shared topic.
When you’ve got your cluster running smoothly and handling traffic, the next challenge you’ll face is monitoring and understanding the health and performance of that distributed system.