MariaDB’s utf8mb4 character set isn’t just a "better UTF-8"; it’s the only way to store full Unicode, including emojis and many Asian characters, correctly in MariaDB.
Let’s see it in action. Imagine you have a simple table:
CREATE TABLE messages (
id INT AUTO_INCREMENT PRIMARY KEY,
content VARCHAR(255)
);
If your server and table aren’t configured for utf8mb4, inserting an emoji like "🚀" might result in a broken character or an error.
-- With incorrect configuration, this might fail or store '???'
INSERT INTO messages (content) VALUES ('Launching the rocket 🚀!');
But if utf8mb4 is set up right, that emoji is stored perfectly.
SELECT content FROM messages WHERE id = 1;
-- Output: Launching the rocket 🚀!
The problem MariaDB’s utf8mb4 solves is the historical limitation of its older utf8 (which is actually utf8mb3). The original UTF-8 standard allowed for up to six bytes per character, but MySQL/MariaDB’s utf8 implementation stopped at three bytes. This was sufficient for most Western European languages but excluded characters requiring four bytes, most notably emojis, but also many CJK (Chinese, Japanese, Korean) characters. utf8mb4 supports the full four-byte range, aligning with the modern UTF-8 standard.
To use utf8mb4 correctly, you need to configure it at three levels: the server, the database, and the table.
1. Server Configuration (my.cnf or my.ini)
This is the most crucial step. You need to tell the MariaDB server itself to default to utf8mb4.
-
Diagnosis: Check the current server default character set.
mysql -u root -p -e "SHOW VARIABLES LIKE 'character_set_server';"If this shows
utf8orlatin1, it’s not configured forutf8mb4. -
Fix: Edit your MariaDB configuration file (e.g.,
/etc/mysql/my.cnf,/etc/my.cnf, orC:\ProgramData\MySQL\MySQL Server X.Y\my.ini). Add or modify the following lines under the[mysqld]section:[mysqld] character-set-server=utf8mb4 collation-server=utf8mb4_unicode_ciRestart your MariaDB server for these changes to take effect.
sudo systemctl restart mariadb # Or equivalent for your OS -
Why it works:
character-set-serversets the default character set for the server when it starts.collation-serverdefines the rules for string comparison and sorting.utf8mb4_unicode_ciis a widely compatible, case-insensitive collation forutf8mb4.
2. Database Configuration
Once the server defaults are set, new databases will inherit them. However, existing databases might retain older settings.
-
Diagnosis: Check the character set and collation for your specific database.
SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME = 'your_database_name';If it’s not
utf8mb4andutf8mb4_unicode_ci, you need to alter it. -
Fix: Alter the database to use
utf8mb4.ALTER DATABASE your_database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci; -
Why it works: This explicitly sets the default character set and collation for any new tables created within this database. It doesn’t automatically change existing tables.
3. Table Configuration
Even if the server and database are set to utf8mb4, tables created before these changes, or tables created with explicit different settings, will need to be altered.
-
Diagnosis: Check the character set and collation for your table.
SELECT TABLE_NAME, ENGINE, ROW_FORMAT, TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'your_database_name' AND TABLE_NAME = 'your_table_name';Also, check column-level settings if
TABLE_COLLATIONisutf8mb4but you suspect issues. -
Fix: Alter the table to use
utf8mb4.ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;This command not only changes the table’s default character set and collation but also converts all character-based columns (like
VARCHAR,TEXT,CHAR) to the new encoding. -
Why it works:
CONVERT TOensures that the table’s metadata and all its string data are updated to use theutf8mb4encoding and the specified collation, resolving any internal representation issues.
4. Connection Configuration
Finally, your client connection needs to speak utf8mb4 to the server.
-
Diagnosis: Check the client character set.
SHOW VARIABLES LIKE 'character_set_client'; SHOW VARIABLES LIKE 'character_set_connection'; SHOW VARIABLES LIKE 'character_set_results';These should ideally match
utf8mb4. -
Fix: When connecting, specify the client character set. For
mysqlcommand-line client:mysql -u your_user -p --default-character-set=utf8mb4 your_database_nameFor application connections (e.g., PHP, Python), find the connection string or configuration option and set it to
utf8mb4. For example, in PHP PDO:$dsn = "mysql:host=localhost;dbname=your_db;charset=utf8mb4"; $pdo = new PDO($dsn, $user, $password); -
Why it works: This ensures that data sent to the server and data received from the server is correctly interpreted as
utf8mb4, preventing misinterpretations at the application boundary.
When you’ve got all these in place, you can confidently store emojis and a much wider range of international characters. The next hurdle you might encounter is dealing with older, non-utf8mb4 dump files that need careful import handling.