New Relic’s database monitoring tells you that a query is slow, but the real magic is in understanding why it’s slow and what you can do about it, which often has nothing to do with the query itself.

Let’s see it in action. Imagine you’re looking at a New Relic transaction trace for a web request that’s taking too long. You drill into the "Database" segment and see a specific SQL query flagged as slow.

SELECT * FROM users WHERE last_login < '2023-10-27 00:00:00' ORDER BY registration_date DESC LIMIT 100;

This query might look innocent, but if it’s running for 5 seconds, that’s a problem. New Relic shows you this, but it doesn’t tell you about the disk I/O contention on the database server, or that the application is holding an unnecessary lock on the users table while this query executes.

The Mental Model: What’s Really Happening

When New Relic flags a slow query, it’s a symptom. The root cause can be anywhere in the stack.

  1. The Query Itself: The most obvious place to start. Is the query inefficient? Is it missing an index? Is it returning too much data?
  2. Database Load: Is the database server overloaded? High CPU, memory pressure, or disk I/O can slow down all queries, not just this one.
  3. Application Behavior: Is the application holding locks for too long? Is it making many small, inefficient queries instead of one larger, optimized one? Is it fetching more data than it needs?
  4. Network Latency: While less common for individual query slowness, if the database is on a separate server, network issues can add up.
  5. Database Configuration: Suboptimal database settings (e.g., buffer pool size, query cache settings) can cripple performance.

Seeing the Gears Turn

Let’s say New Relic points to that users query.

Scenario 1: Missing Index

  • Diagnosis: You’d run EXPLAIN on the query within your database client (e.g., psql, mysql, sqlcmd).
    EXPLAIN SELECT * FROM users WHERE last_login < '2023-10-27 00:00:00' ORDER BY registration_date DESC LIMIT 100;
    
    If the EXPLAIN plan shows a "full table scan" or a scan of a large portion of the table, especially for the WHERE clause, that’s your culprit.
  • Fix: Add an index that covers both last_login and registration_date.
    CREATE INDEX idx_users_login_reg ON users (last_login, registration_date);
    
  • Why it works: The index allows the database to quickly locate rows where last_login matches the condition without scanning every row. The ORDER BY clause is also optimized because the index is ordered by registration_date (after filtering by last_login).

Scenario 2: Application Fetching Too Much Data

  • Diagnosis: Look at the SELECT * in the query. Is the application actually using all those columns? Check the transaction trace in New Relic. Does the application segment after the database call show it processing many columns from the users table?
  • Fix: Modify the query to only select the columns the application needs.
    SELECT user_id, username, email FROM users WHERE last_login < '2023-10-27 00:00:00' ORDER BY registration_date DESC LIMIT 100;
    
  • Why it works: Less data needs to be read from disk, transferred over the network, and processed by the application. This is especially impactful if the users table has large columns like profile_picture_blob or extensive notes.

Scenario 3: Database Server Overload

  • Diagnosis: In New Relic, look at the "Databases" overview page, not just the transaction trace. Check the "Server" metrics for the database host. Are CPU utilization consistently above 80%? Is iowait high? Or, use OS-level tools like top, htop, iostat on the database server itself.
  • Fix: This is often a scaling problem. You might need to:
    • Scale up: Increase CPU, RAM, or disk speed on the existing server.
    • Scale out: Implement read replicas for read-heavy workloads.
    • Optimize other queries: Find and fix other slow queries that are contributing to the overall load.
  • Why it works: By reducing the load on the server, it has more resources to dedicate to executing the current query efficiently.

Scenario 4: Application Holding Excessive Locks

  • Diagnosis: This is trickier. You need to correlate database activity with application behavior. In New Relic, look for the slow query and check the application segments that run concurrently or immediately before it. Are there long-running application operations that might be holding locks? Some databases expose lock wait statistics (pg_locks in PostgreSQL, SHOW ENGINE INNODB STATUS in MySQL).
  • Fix: Refactor the application code to acquire locks for the shortest duration possible, or use less restrictive lock types. For example, instead of a long transaction that modifies user data, commit smaller, independent transactions.
    // Bad: Long transaction holding lock
    startTransaction();
    updateUser(userId, data); // Holds lock on user
    // ... do other stuff for 10 seconds ...
    commitTransaction();
    
    // Good: Shorter transaction
    updateUser(userId, data); // Lock acquired and released quickly
    // ... do other stuff for 10 seconds ...
    
  • Why it works: The slow query is no longer blocked by an application holding a lock. It can acquire the necessary locks and proceed.

Scenario 5: Suboptimal Database Configuration

  • Diagnosis: Review your database’s configuration file (e.g., postgresql.conf, my.cnf). Look at parameters like shared_buffers (PostgreSQL) or innodb_buffer_pool_size (MySQL). Are they set to a reasonable percentage of your server’s RAM? For example, shared_buffers should typically be 25% of total RAM.
  • Fix: Adjust the configuration parameters. For instance, on a 64GB RAM server for PostgreSQL:
    # postgresql.conf
    shared_buffers = 16GB
    
    After changing, you’ll need to restart the database service.
  • Why it works: A larger buffer pool allows the database to cache more data and indexes in memory, drastically reducing the need for slow disk reads.

The most overlooked aspect is how your application interacts with the database beyond just the SQL itself. A perfectly optimized query can still be slow if the application is frequently acquiring row locks and holding them for extended periods during unrelated operations, effectively blocking that query from even starting its work efficiently.

The next thing you’ll likely encounter is New Relic flagging an increase in the number of database calls for a transaction, even if individual calls aren’t slow, leading to overall transaction degradation.

Want structured learning?

Take the full Newrelic course →