The MariaDB CONNECT engine lets you query external data sources as if they were local tables, but the real magic is how it translates those external queries into the native query language of the source.
Let’s see it in action. Imagine you have a CSV file named users.csv in your local filesystem:
id,name,email
1,Alice,alice@example.com
2,Bob,bob@example.com
3,Charlie,charlie@example.com
And you want to query it from MariaDB. First, you need to enable the CONNECT engine. If it’s not already installed, you’d typically install it via your package manager or compile from source. Once installed, you enable it in your MariaDB configuration file (my.cnf or my.ini) or by running:
INSTALL PLUGIN CONNECT SONAME 'ha_connect.so';
Now, you can create a table that points to your CSV file:
CREATE TABLE users_csv (
id INT,
name VARCHAR(100),
email VARCHAR(100)
) ENGINE=CONNECT TABLE_TYPE=CSV CONNECTION='users.csv';
Notice TABLE_TYPE=CSV and CONNECTION='users.csv'. The CONNECTION string is the crucial part; it tells CONNECT where to find the data and what format it’s in.
With that set up, you can query users_csv just like any other MariaDB table:
SELECT id, name FROM users_csv WHERE id > 1;
This query will return:
+----+-------+
| id | name |
+----+-------+
| 2 | Bob |
| 3 | Charlie |
+----+-------+
CONNECT doesn’t just read CSVs. It has a vast array of "handlers" for different data sources: JSON, XML, ODBC, even other MySQL/MariaDB databases, and more. For example, to connect to a remote MySQL database, you’d use the MYSQL handler:
CREATE TABLE remote_orders (
order_id INT,
customer_name VARCHAR(100),
order_date DATE
) ENGINE=CONNECT TABLE_TYPE=MYSQL CONNECTION='mysql://user:password@remote_host:3306/database_name';
The CONNECTION string here is a standard URI format, specifying the protocol, credentials, host, port, and database. Once created, SELECT * FROM remote_orders WHERE order_date > '2023-01-01'; would execute a query on the remote_host database.
The real power comes from CONNECT’s ability to push down operations. If you query users_csv with a WHERE clause, CONNECT doesn’t fetch the entire file and filter it in MariaDB. Instead, if the handler supports it (like the CSV handler does for simple filtering), it will filter the data at the source. For the CSV handler, this means it reads only the rows that match the WHERE condition. For more complex sources like remote databases, it translates your SQL query into the native SQL of that database, minimizing data transfer and maximizing performance.
The TABLE_TYPE parameter is key to telling CONNECT how to interpret the CONNECTION string and how to interact with the source. Different TABLE_TYPEs have different options for their CONNECTION strings and support different query capabilities.
When you create a table using the CONNECT engine, you’re not creating a new storage mechanism. You’re creating a virtual table that acts as a proxy to an external data source. The actual data remains where it is. MariaDB then uses the CONNECT engine to interact with that source, fetching data on demand or pushing queries down to the source for processing.
One of the most subtle yet powerful aspects is how CONNECT handles data type conversions. When you define your virtual table’s schema in MariaDB, CONNECT attempts to map those types to the native types of the source. For example, if you define a DATETIME column in MariaDB that maps to a string in a JSON file, CONNECT will try to parse that string into a DATETIME object. Conversely, when fetching data from a remote MySQL database, it will convert that database’s INT to your MariaDB INT. This automatic conversion can save a lot of manual data wrangling, but it’s also where unexpected issues can arise if the source data doesn’t conform to the expected format.
The next logical step is to explore how to combine data from multiple external sources, or even external and internal tables, using JOINs.