New Relic’s database monitoring tells you that a query is slow, but the real magic is in understanding why it’s slow and what you can do about it, which often has nothing to do with the query itself.
Let’s see it in action. Imagine you’re looking at a New Relic transaction trace for a web request that’s taking too long. You drill into the "Database" segment and see a specific SQL query flagged as slow.
SELECT * FROM users WHERE last_login < '2023-10-27 00:00:00' ORDER BY registration_date DESC LIMIT 100;
This query might look innocent, but if it’s running for 5 seconds, that’s a problem. New Relic shows you this, but it doesn’t tell you about the disk I/O contention on the database server, or that the application is holding an unnecessary lock on the users table while this query executes.
The Mental Model: What’s Really Happening
When New Relic flags a slow query, it’s a symptom. The root cause can be anywhere in the stack.
- The Query Itself: The most obvious place to start. Is the query inefficient? Is it missing an index? Is it returning too much data?
- Database Load: Is the database server overloaded? High CPU, memory pressure, or disk I/O can slow down all queries, not just this one.
- Application Behavior: Is the application holding locks for too long? Is it making many small, inefficient queries instead of one larger, optimized one? Is it fetching more data than it needs?
- Network Latency: While less common for individual query slowness, if the database is on a separate server, network issues can add up.
- Database Configuration: Suboptimal database settings (e.g., buffer pool size, query cache settings) can cripple performance.
Seeing the Gears Turn
Let’s say New Relic points to that users query.
Scenario 1: Missing Index
- Diagnosis: You’d run
EXPLAINon the query within your database client (e.g.,psql,mysql,sqlcmd).
If theEXPLAIN SELECT * FROM users WHERE last_login < '2023-10-27 00:00:00' ORDER BY registration_date DESC LIMIT 100;EXPLAINplan shows a "full table scan" or a scan of a large portion of the table, especially for theWHEREclause, that’s your culprit. - Fix: Add an index that covers both
last_loginandregistration_date.CREATE INDEX idx_users_login_reg ON users (last_login, registration_date); - Why it works: The index allows the database to quickly locate rows where
last_loginmatches the condition without scanning every row. TheORDER BYclause is also optimized because the index is ordered byregistration_date(after filtering bylast_login).
Scenario 2: Application Fetching Too Much Data
- Diagnosis: Look at the
SELECT *in the query. Is the application actually using all those columns? Check the transaction trace in New Relic. Does the application segment after the database call show it processing many columns from theuserstable? - Fix: Modify the query to only select the columns the application needs.
SELECT user_id, username, email FROM users WHERE last_login < '2023-10-27 00:00:00' ORDER BY registration_date DESC LIMIT 100; - Why it works: Less data needs to be read from disk, transferred over the network, and processed by the application. This is especially impactful if the
userstable has large columns likeprofile_picture_blobor extensivenotes.
Scenario 3: Database Server Overload
- Diagnosis: In New Relic, look at the "Databases" overview page, not just the transaction trace. Check the "Server" metrics for the database host. Are CPU utilization consistently above 80%? Is
iowaithigh? Or, use OS-level tools liketop,htop,iostaton the database server itself. - Fix: This is often a scaling problem. You might need to:
- Scale up: Increase CPU, RAM, or disk speed on the existing server.
- Scale out: Implement read replicas for read-heavy workloads.
- Optimize other queries: Find and fix other slow queries that are contributing to the overall load.
- Why it works: By reducing the load on the server, it has more resources to dedicate to executing the current query efficiently.
Scenario 4: Application Holding Excessive Locks
- Diagnosis: This is trickier. You need to correlate database activity with application behavior. In New Relic, look for the slow query and check the application segments that run concurrently or immediately before it. Are there long-running application operations that might be holding locks? Some databases expose lock wait statistics (
pg_locksin PostgreSQL,SHOW ENGINE INNODB STATUSin MySQL). - Fix: Refactor the application code to acquire locks for the shortest duration possible, or use less restrictive lock types. For example, instead of a long transaction that modifies user data, commit smaller, independent transactions.
// Bad: Long transaction holding lock startTransaction(); updateUser(userId, data); // Holds lock on user // ... do other stuff for 10 seconds ... commitTransaction(); // Good: Shorter transaction updateUser(userId, data); // Lock acquired and released quickly // ... do other stuff for 10 seconds ... - Why it works: The slow query is no longer blocked by an application holding a lock. It can acquire the necessary locks and proceed.
Scenario 5: Suboptimal Database Configuration
- Diagnosis: Review your database’s configuration file (e.g.,
postgresql.conf,my.cnf). Look at parameters likeshared_buffers(PostgreSQL) orinnodb_buffer_pool_size(MySQL). Are they set to a reasonable percentage of your server’s RAM? For example,shared_buffersshould typically be 25% of total RAM. - Fix: Adjust the configuration parameters. For instance, on a 64GB RAM server for PostgreSQL:
After changing, you’ll need to restart the database service.# postgresql.conf shared_buffers = 16GB - Why it works: A larger buffer pool allows the database to cache more data and indexes in memory, drastically reducing the need for slow disk reads.
The most overlooked aspect is how your application interacts with the database beyond just the SQL itself. A perfectly optimized query can still be slow if the application is frequently acquiring row locks and holding them for extended periods during unrelated operations, effectively blocking that query from even starting its work efficiently.
The next thing you’ll likely encounter is New Relic flagging an increase in the number of database calls for a transaction, even if individual calls aren’t slow, leading to overall transaction degradation.