The MariaDB Spider engine can distribute data across multiple database servers, but it doesn’t magically balance the load or prevent hotspots; you have to engineer that yourself.
Let’s see it in action. Imagine we have two MariaDB instances, db1 on 192.168.1.10 and db2 on 192.168.1.11, both running MariaDB 10.5. We want to shard a users table.
First, on both servers, we need to create the Spider engine if it’s not already enabled.
-- On db1 and db2
INSTALL PLUGIN spider SONAME 'spider.so';
SET GLOBAL spider_port = 13306; -- Default is 13306, ensure it's accessible
SET GLOBAL spider_max_connections = 1000;
SET GLOBAL spider_default_storage_engine = 'InnoDB'; -- Or your preferred engine
Now, on our "primary" node (let’s say db1), we define the remote nodes.
-- On db1
CREATE SERVER server_db1 FOREIGN DATA WRAPPER mysql
OPTIONS (HOST '192.168.1.10', PORT '3306');
CREATE SERVER server_db2 FOREIGN DATA WRAPPER mysql
OPTIONS (HOST '192.168.1.11', PORT '3306');
Next, we create the Spider table on db1. This table definition will be mirrored on the other nodes.
-- On db1
CREATE TABLE users (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
username VARCHAR(50) NOT NULL,
email VARCHAR(100),
PRIMARY KEY (id)
) ENGINE=SPIDER
COMMENT='Spider table for sharding users';
Now, we tell Spider where the actual data will live and how to partition it. This is the core of sharding. We’ll shard by id using a simple modulo.
-- On db1
ALTER TABLE users OPTIONS (
NODE_AUTO_ баланс = 1,
NODE_LIST = 'server_db1,server_db2',
NODE_KEY = 'id',
NODE_PARTITION_METHOD = 'HASH',
NODE_PARTITION_NUM = 2
);
NODE_AUTO_ баланс = 1 tells Spider to try and distribute new data. NODE_LIST specifies the available servers. NODE_KEY is the column to shard on. NODE_PARTITION_METHOD = 'HASH' means we’re using a hash function on NODE_KEY. NODE_PARTITION_NUM = 2 means we want two partitions.
Spider will now automatically create the underlying tables on db1 and db2 and configure the partitioning. If you SHOW CREATE TABLE users; on db1, you’ll see something like:
-- On db1 (after ALTER TABLE)
CREATE TABLE `users` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(50) NOT NULL,
`email` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=Spider DEFAULT CHARSET=utf8mb4 COMMENT='Spider table for sharding users'
PARTITION BY HASH (id)
(PARTITION p0 COMMENT = 'NODE=server_db1',
PARTITION p1 COMMENT = 'NODE=server_db2');
Notice how PARTITION p0 COMMENT = 'NODE=server_db1' and PARTITION p1 COMMENT = 'NODE=server_db2' map partitions to specific servers. When you insert a row, say INSERT INTO users (username, email) VALUES ('alice', 'alice@example.com');, Spider calculates HASH(LAST_INSERT_ID()) and routes the data to the correct node. If HASH(id) falls into partition p0, the data goes to server_db1. If it falls into p1, it goes to server_db2.
The real power comes when you have a high write throughput. A SELECT query like SELECT * FROM users WHERE id = 12345; is efficient because Spider knows exactly which node to query based on id. A query like SELECT * FROM users WHERE username = 'bob'; is not efficient. Spider doesn’t know which partition bob is in, so it has to query all nodes. This is a common pitfall: sharding on a column that isn’t frequently used in WHERE clauses for targeted lookups.
This setup allows you to scale horizontally. As your data grows, you can add more server_dbX nodes, update the NODE_LIST on your primary Spider table, and potentially rebalance partitions (though rebalancing is a manual, often complex, process involving ALTER TABLE users REORGANIZE PARTITION).
The most surprising thing is how Spider uses the COMMENT clause in PARTITION definitions. It’s not just descriptive; it’s the actual mechanism Spider uses to associate a partition with a specific remote server defined by a CREATE SERVER statement. When you add a new server, you’re essentially adding a new target for these NODE= comments.
The next logical step is to explore how to handle cross-node JOINs, which Spider supports but with significant performance implications.