Galera, MaxScale, and replication are three distinct, yet often conflated, approaches to achieving high availability (HA) in MariaDB environments.
Let’s dive into how these components work together, or independently, to keep your database humming.
Galera Cluster: Synchronous Replication for True HA
Galera is a multi-master synchronous replication solution. This means that when you write to any node in a Galera cluster, the transaction is committed to all other nodes before the client receives confirmation.
Here’s a simplified view of a Galera write operation:
- Client writes to Node A.
- Node A receives the write. It then broadcasts the transaction to all other nodes (Node B, Node C, etc.).
- Nodes B, C, etc., receive the transaction. They perform certification checks (ensure the transaction is still valid and won’t conflict with other writes).
- If certification passes on all nodes, the transaction is committed locally on each node.
- Nodes B, C, etc., acknowledge successful commit to Node A.
- Node A acknowledges successful commit to the client.
This synchronous nature guarantees that all nodes are always in the same state, eliminating the stale reads and split-brain scenarios common with asynchronous replication.
Key Galera Components:
wsrep_provider: Specifies the Galera library (libgalera_smm.so).wsrep_cluster_address: The Galera cluster’s endpoint, typically a comma-separated list of node addresses (e.g.,gcomm://192.168.1.101:3306,192.168.1.102:3306,192.168.1.103:3306).wsrep_cluster_name: A unique name for the cluster.wsrep_node_address: The IP address of the current node.wsrep_sst_method: The State Snapshot Transfer method used when a new node joins or an old one rejoins (e.g.,rsync,mariabackup).
Example Galera Configuration Snippet (my.cnf):
[mysqld]
# Galera Provider Configuration
wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so
wsrep_cluster_address=gcomm://192.168.1.101:3306,192.168.1.102:3306,192.168.1.103:3306
wsrep_cluster_name=my_galera_cluster
wsrep_node_address=192.168.1.101
wsrep_sst_method=mariabackup
MariaDB Replication: Asynchronous or Semi-Synchronous
Traditional MariaDB replication is asynchronous by default. A primary (master) server logs its changes to a binary log, and one or more replica (slave) servers read this log and apply the changes.
Asynchronous Replication Flow:
- Primary writes to its binary log.
- Replica connects to the primary, requests the binary log.
- Replica reads events from the binary log and applies them to its own data.
The "asynchronous" part means the primary doesn’t wait for the replica to acknowledge applying the changes. This is fast but can lead to data loss if the primary fails before the changes are replicated.
Semi-Synchronous Replication:
MariaDB also supports semi-synchronous replication. In this mode, the primary waits for at least one replica to acknowledge receiving the transaction before confirming the commit to the client. This significantly reduces the risk of data loss compared to asynchronous replication.
Key Replication Components:
log_bin: Enables binary logging on the primary.server_id: A unique ID for each server in a replication topology.relay_log: The file where replicas store events read from the primary’s binary log.read_only: Set to1on replicas to prevent accidental writes.gtid_mode=ON: (Recommended) Enables Global Transaction Identifiers for easier replication management.
Example Primary Configuration (my.cnf):
[mysqld]
server_id=1
log_bin=mysql-bin
gtid_mode=ON
enforce_gtid_consistency=ON
binlog_format=ROW
Example Replica Configuration (my.cnf):
[mysqld]
server_id=2
relay_log=mysql-relay-bin
read_only=1
gtid_mode=ON
enforce_gtid_consistency=ON
binlog_format=ROW
To set up replication:
- On the primary:
GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'%' IDENTIFIED BY 'password'; FLUSH PRIVILEGES; - On the replica:
CHANGE MASTER TO MASTER_HOST='<primary_ip>', MASTER_USER='repl_user', MASTER_PASSWORD='password', MASTER_PORT=3306, MASTER_AUTO_POSITION=1; START SLAVE;
MaxScale: The Intelligent Proxy
MaxScale is a highly versatile database proxy. It sits between your applications and your MariaDB servers, offering features like load balancing, read/write splitting, query caching, and high availability management.
MaxScale doesn’t replicate data itself; it intelligently directs traffic to your backend MariaDB instances. This makes it invaluable for managing Galera clusters or traditional replication setups.
MaxScale for Galera:
MaxScale can monitor a Galera cluster and automatically route read queries to available nodes and write queries to a designated primary (or any node, depending on configuration). If a node in the Galera cluster fails, MaxScale can detect this and stop sending traffic to it.
MaxScale for Replication:
For traditional replication, MaxScale can:
- Load balance read traffic across multiple replicas.
- Direct all write traffic to the single primary server.
- Monitor the primary and replicas and automatically failover to a new primary if the current one becomes unavailable.
MaxScale Configuration (maxscale.cnf):
[galeramon]
type=monitor
module=galeramonitor
user=maxscale_user
password=maxscale_password
servers=server1,server2,server3
[readwritesplit]
type=service
router=readwritesplit
servers=server1,server2,server3
user=app_user
password=app_password
In this example:
galeramonmonitors three servers for Galera status.readwritesplitis a service that uses thereadwritesplitrouter. It directs reads to any ofserver1,server2,server3and writes to the primary.
The Power of Combining:
- Galera + MaxScale: Provides true synchronous multi-master HA with intelligent traffic management, failover, and read scaling across all nodes.
- Replication + MaxScale: Offers robust read scaling, automatic failover for writes, and resilience against primary server failure, while keeping the complexity of traditional replication.
The most surprising thing about MaxScale is its ability to virtually create HA solutions for non-HA databases. It can take a single primary and one or more replicas, and with the failover router, it can automatically promote a replica to become the new primary if the original one dies, and then reconfigure the remaining replicas to follow the new primary.
The next step is understanding how to effectively monitor the health and performance of these combined HA setups, often involving dedicated monitoring tools and custom alert configurations.