MariaDB’s EXPLAIN command is your window into how the database actually executes your SQL queries, and the JSON output format is like getting an X-ray with all the diagnostic data laid bare.
Let’s see it in action. Imagine this simple query against a users table with id, name, and email columns:
SELECT name, email
FROM users
WHERE id = 12345;
Now, let’s ask MariaDB how it plans to execute this:
EXPLAIN FORMAT=JSON SELECT name, email FROM users WHERE id = 12345;
The output will be a single JSON object, potentially very large and nested. At its heart, it describes a plan, a sequence of operations. For our simple query, it might look something like this (simplified for clarity):
{
"query_block": {
"select_id": 1,
"table": {
"table_name": "users",
"access_type": "const",
"possible_keys": ["PRIMARY"],
"key": "PRIMARY",
"key_length": "8",
"rows": 1,
"filtered": 100.00,
"extra": "Using index condition"
},
"cost_info": {
"query_cost": "1.00"
}
}
}
This tells us MariaDB identified the users table as the target. access_type: "const" means it’s going to find a single row directly, likely because id is the primary key. key: "PRIMARY" confirms it’s using the primary index. rows: 1 is the estimated number of rows it will examine, and filtered: 100.00 means all examined rows match the WHERE condition. extra: "Using index condition" is a good sign, indicating it’s using the index to filter rows efficiently. query_cost: "1.00" is an abstract measure of how expensive the operation is.
The real power of EXPLAIN FORMAT=JSON comes when queries get more complex: joins, subqueries, sorting, grouping. The JSON output will detail each step, each table involved, the join order, the type of join, which indexes are considered and which are chosen, how many rows are estimated at each stage, and what filtering happens.
Consider a query with a join:
SELECT u.name, o.order_date
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.email = 'test@example.com';
The EXPLAIN FORMAT=JSON output for this will be much more elaborate. You’ll see nested query_block or subqueries sections. Each table entry will describe how that specific table is accessed. For a join, you’ll see nested_loop, hash_join, or merge_join as the join_type. The possible_keys and key fields become crucial for understanding if the join is using appropriate indexes. The rows and filtered values at each stage help pinpoint where the query might be doing unnecessary work.
The most surprising true thing about EXPLAIN FORMAT=JSON is that the rows and filtered values are estimates based on statistics. If those statistics are stale or inaccurate, the plan might look great to EXPLAIN but perform poorly in reality. This is why ANALYZE TABLE users; is often a companion command to EXPLAIN.
To build a mental model, think of the EXPLAIN output as a directed acyclic graph (DAG) of operations. The JSON structure represents this graph, where each table or subquery is a node, and the relationships between them (like joins) define the edges. The select_id helps track the order of operations within a single SELECT statement. You can trace the flow from the initial table access, through joins, filtering, sorting, and finally to the projection of columns in the select_list. The cost_info aggregates the estimated costs of all operations.
The exact levers you control are primarily through:
- Indexing: Creating or dropping indexes based on
possible_keysandkeysuggestions. - Query Rewriting: Restructuring your SQL to guide MariaDB towards a better plan (e.g., ensuring join conditions are on indexed columns, pushing filters down).
- Server Configuration: Though less direct, settings related to optimizer behavior can influence the chosen plan.
- Statistics: Keeping table statistics up-to-date with
ANALYZE TABLE.
One thing most people don’t know is how deeply the extra field can go. It’s not just "Using index"; it can be "Using index condition," "Using where," "Using temporary," "Using filesort," "Using join buffer (Block Nested Loop)," and many more. Each of these strings is a crucial diagnostic clue. For instance, "Using temporary" and "Using filesort" often indicate that an operation couldn’t be performed using an index and required intermediate disk or memory structures, which are typically performance bottlenecks, especially for large datasets. You’d then look at the preceding table entries to see why a temporary table or filesort was needed – often it’s because a GROUP BY or ORDER BY clause couldn’t leverage an index.
The next step after understanding a query plan is often optimizing it using OPTIMIZER_TRACE.