MariaDB replication is notoriously single-threaded, meaning a replica can only apply transactions one at a time, which quickly becomes a bottleneck when write loads are high. Parallel replication, introduced in MariaDB 10.0.3, allows the replica to apply transactions concurrently using multiple worker threads, significantly reducing replica lag.
Let’s see it in action. Imagine a primary server with a simple INSERT statement running every second:
-- Primary Server
CREATE TABLE test (id INT PRIMARY KEY AUTO_INCREMENT, val VARCHAR(255));
INSERT INTO test (val) VALUES (UUID());
On a replica without parallel replication enabled, you’d see Seconds_Behind_Master increasing. With parallel replication, the replica can keep up.
The core idea is to group transactions that can be applied independently and assign them to different worker threads. MariaDB uses a combination of the binlog_format and the slave_parallel_workers setting to achieve this.
Here’s how it works internally:
-
Transaction Grouping: The replica reads events from the binary log. It groups these events into "transaction groups." The primary mechanism for this grouping is the
binlog_format.- ROW-based replication (
binlog_format=ROW): This is the most effective format for parallel replication because each event represents a single row change. The replica can easily identify independent transactions based on the rows they affect. - STATEMENT-based replication (
binlog_format=STATEMENT): This format is problematic for parallel replication. If a statement affects multiple rows or has side effects, the replica might not be able to determine if it can be run in parallel with other statements. For example,INSERT ... SELECTorUPDATE ... LIMITcan be tricky. - MIXED-based replication (
binlog_format=MIXED): This format uses STATEMENT for statements that are safe to replicate as statements and ROW for statements that are not. It offers a compromise but can still lead to some transactions being serialized.
- ROW-based replication (
-
Worker Threads: Once transactions are grouped, the replica uses a pool of worker threads to apply them. The number of worker threads is controlled by the
slave_parallel_workerssetting. -
Coordination: A "coordinator" thread manages the worker threads. It assigns transaction groups to available workers. To ensure data consistency, the coordinator must ensure that transactions that depend on each other are applied in the correct order. For example, if transaction B modifies a row that transaction A just inserted, B must be applied after A. This is where the
slave_ordered_commitsetting comes into play.
To enable and configure parallel replication, you primarily adjust two settings on the replica server.
First, ensure your binlog_format is set to ROW or MIXED on the primary and the replica. ROW is generally preferred for optimal parallelization.
# On primary and replica
[mariadb]
binlog_format = ROW
Next, configure the number of parallel worker threads on the replica. A common starting point is to set slave_parallel_workers to the number of CPU cores available on the replica, or slightly less.
# On replica
[mariadb]
slave_parallel_workers = 4
You also need to tell the replica how to order commits. For robust parallel replication, slave_ordered_commit should be enabled. This ensures that transactions are applied in the same order they were committed on the primary, even when applied in parallel.
# On replica
[mariadb]
slave_ordered_commit = ON
After making these changes, restart the MariaDB replica server. Then, on the replica, you’ll need to tell the replica threads to start using the parallel settings.
-- On replica
STOP SLAVE;
CHANGE MASTER TO MASTER_HOST='primary_host', MASTER_USER='repl_user', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin.XXXXXX', MASTER_LOG_POS=YYYYYY, MASTER_PORT=3306; -- (Your existing CHANGE MASTER settings)
SET GLOBAL slave_parallel_workers = 4; -- Match the value set in my.cnf
SET GLOBAL slave_ordered_commit = ON; -- Match the value set in my.cnf
START SLAVE;
You can verify that parallel replication is active by checking the replica status:
SHOW SLAVE STATUS\G
Look for Slave_IO_Running: Yes and Slave_SQL_Running: Yes. Crucially, if parallel replication is working, you’ll see Slave_Parallel_Workers greater than 0 and Slave_Ordered_Commit: ON. The Seconds_Behind_Master should now be significantly lower or zero under load.
The most impactful tuning parameter is often slave_parallel_workers. While setting it to the number of CPU cores is a good start, you might need to experiment. Too many workers can lead to contention and context-switching overhead, actually increasing lag. Too few, and you’re not fully utilizing your hardware. Monitor SHOW SLAVE STATUS and the replica’s CPU usage to find the sweet spot. Also, consider slave_parallel_threads (which is an alias for slave_parallel_workers in newer versions) if you’re on a very recent MariaDB version.
A common pitfall is not setting slave_ordered_commit = ON. Without it, the replica might apply transactions out of order, leading to data inconsistencies, even if Seconds_Behind_Master looks good. The replica will detect these inconsistencies and stop, but it’s a painful debugging session.
When you fix lag issues with parallel replication, the next problem you’ll likely encounter is dealing with locking contention between parallel threads, especially on tables with high write concurrency.