Statement-based replication is the default and can lead to data inconsistencies when statements have non-deterministic outcomes.

Here’s a simple replication setup:

master.cnf on the master server:

[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = STATEMENT
auto_increment_increment = 2
auto_increment_offset = 1

slave.cnf on the slave server:

[mysqld]
server-id = 2
relay_log = /var/log/mysql/mysql-relay-bin.log
read_only = 1

On the master, create a table and insert data:

CREATE TABLE test_replication (
    id INT AUTO_INCREMENT PRIMARY KEY,
    value VARCHAR(50)
);

INSERT INTO test_replication (value) VALUES ('initial');

Now, on the slave, you’d typically run:

-- Connect to slave
CHANGE MASTER TO
    MASTER_HOST='<master_ip>',
    MASTER_USER='repl_user',
    MASTER_PASSWORD='repl_password',
    MASTER_LOG_FILE='mysql-bin.000001', -- This will be the first log file after enabling binlog
    MASTER_LOG_POS=107; -- This position is determined by SHOW MASTER STATUS on the master
START SLAVE;

If the master has auto_increment_increment = 2 and auto_increment_offset = 1, the ids will be 1, 3, 5, etc. If the slave doesn’t have these settings, and a new row is inserted, its id might be 2, 4, 6, etc., causing a mismatch. Statement-based replication would simply send the INSERT statement, which on the slave might generate a different id.

Row-based replication (RBR) solves this by logging the actual rows that were changed, not the SQL statement that caused the change.

Let’s switch the master to RBR:

master.cnf (modified):

[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
auto_increment_increment = 2
auto_increment_offset = 1

Restart the master. Now, when INSERT INTO test_replication (value) VALUES ('new data'); is run, the binary log will contain the new row with id = 3 (due to the auto-increment settings). The slave receives this row data and applies it directly, ensuring the id is indeed 3.

The core problem RBR addresses is non-determinism. This includes:

  • NOW(), UUID(), RAND() functions.
  • AUTO_INCREMENT columns when auto_increment_increment and auto_increment_offset are not synchronized between master and slave, or when inserts happen concurrently on master and slave.
  • Statements that affect a variable number of rows (e.g., DELETE FROM table WHERE col > value where the condition might match different rows on master and slave due to subtle data differences or concurrent operations).

The mental model for RBR is that the master serializes the state changes (the actual row data modifications) and sends those serialized changes to the slave. The slave then deserializes and applies them. This guarantees that if a statement executed on the master resulted in a specific set of rows being inserted, updated, or deleted, the exact same set of rows will be modified on the slave.

The "event" in RBR is a set of row changes. Each event has a header (timestamp, server ID, event type) and a body containing the actual row data. For INSERT statements, the body contains the new row’s values. For UPDATE statements, it contains both the old and new row values. For DELETE statements, it contains the old row values. This explicit logging of before and after states is what provides the determinism.

The one thing most people don’t know is that even with binlog_format=ROW, the master might still write statement events for certain operations that are inherently safe to replicate as statements, like CREATE TABLE or DROP TABLE. This is an optimization to reduce the size of the binary log. The decision of when to use statement vs. row events is made by the master based on the specific SQL statement being executed, aiming for the most efficient and safe replication method.

The next concept to explore is how to handle replication lag and ensure data consistency in complex scenarios.

Want structured learning?

Take the full Mariadb course →