New Relic’s NRQL is a powerful tool, but it’s easy to write queries that are technically correct yet incredibly slow. The real magic happens when you understand how New Relic processes these queries under the hood and then tune your NRQL to play with the system, not against it.
Let’s say you want to see the average duration of POST requests to your /users endpoint over the last hour, broken down by HTTP status code. A naive approach might look like this:
SELECT average(duration) FROM Transaction WHERE appName = 'MyAwesomeApp' AND name = 'WebTransaction/Action/POST /users' AND httpResponseCode IS NOT NULL SINCE 1 hour ago TIMESERIES 1 minute
This query will probably work, but if your Transaction event table is massive (and it likely is), this will take a while to run and consume significant resources. The problem isn’t that the syntax is wrong; it’s that New Relic has to scan a huge number of raw Transaction events and filter them down in real-time.
The core principle of NRQL optimization is to leverage New Relic’s data processing pipeline to do as much work as possible before your query hits the raw event data. This means using features like event summarization and attributes effectively.
1. Use கொண்டிருக்கும் (WHERE) Clauses Judiciously and Early:
The most impactful optimization is filtering as early as possible. If you know you only care about certain appNames or environments, put those conditions in your WHERE clause. New Relic tries to prune data based on these, but the order and type of attributes matter. Indexed attributes (like appName, environment, host) are your best friend.
Diagnosis: Look at your query’s EXPLAIN plan. If you see a full table scan on Transaction when you expect it to be filtered by appName, that’s a red flag.
Fix: Ensure your most selective filters use indexed attributes. For example, if environment is highly selective, place it first.
SELECT average(duration) FROM Transaction WHERE environment = 'production' AND appName = 'MyAwesomeApp' AND name = 'WebTransaction/Action/POST /users' AND httpResponseCode IS NOT NULL SINCE 1 hour ago TIMESERIES 1 minute
Why it works: New Relic can quickly identify and load only the data relevant to the production environment before even considering transactions.
2. Leverage Attributes for Pre-computation:
Instead of calculating average(duration) on raw Transaction events, consider if you can use pre-aggregated data. New Relic automatically creates summary events for many metrics. If you’re querying Transaction directly, you’re often working with individual event data.
Diagnosis: If your query on Transaction events is slow, and you’re performing aggregations like average, sum, count, min, max, check if there’s a corresponding metric available.
Fix: If available, query the SystemSample or Metric tables. For example, to get the average request duration, you might query SystemSample for system.cpu.user or SystemSample for system.io.readBytes. However, for application-specific metrics like duration on Transaction events, this isn’t a direct replacement. The real optimization comes from attributes.
Consider adding custom attributes to your transactions that capture the data you want to query against. For example, if you frequently query by HTTP status code, ensure httpResponseCode is an attribute on your Transaction events.
Why it works: Attributes are key-value pairs directly attached to events. New Relic indexes these attributes, allowing for very fast lookups and filtering without scanning the entire event payload.
3. Use FACET on Indexed Attributes:
Faceting is powerful, but faceting on high-cardinality or unindexed attributes can be slow. If you FACET on something like a user ID that changes for every transaction, New Relic has to group an enormous number of unique values.
Diagnosis: Queries with FACET clauses that are slow, especially on fields that seem to have many unique values.
Fix: Always FACET on indexed attributes. httpResponseCode is usually indexed. appName, environment, host, and custom attributes you’ve explicitly indexed are also good candidates.
SELECT average(duration) FROM Transaction WHERE appName = 'MyAwesomeApp' AND name = 'WebTransaction/Action/POST /users' AND httpResponseCode IS NOT NULL SINCE 1 hour ago TIMESERIES 1 minute FACET httpResponseCode
Why it works: New Relic can efficiently group transactions by the pre-indexed httpResponseCode attribute.
4. Avoid LIKE and NOT LIKE on Large Text Fields:
Wildcard searches (LIKE '%foo%') are notoriously slow because they often require full string scans.
Diagnosis: Queries using LIKE or NOT LIKE operators, especially when the pattern isn’t anchored to the beginning or end.
Fix: Whenever possible, use exact matches (=) or prefix matches (LIKE 'foo%'). If you need to search within a text field, consider if you can extract a more specific attribute during data ingestion.
-- Slow:
-- WHERE name LIKE '%/users'
-- Better:
WHERE name = 'WebTransaction/Action/POST /users'
Why it works: Exact string matching on indexed attributes is orders of magnitude faster than pattern matching.
5. Use TIMESERIES Wisely:
TIMESERIES is useful, but if you specify a very small interval (TIMESERIES 1 second) on a large dataset, New Relic has to perform a lot of bucketing.
Diagnosis: Queries with TIMESERIES that are slow, particularly with very granular time intervals.
Fix: Increase the TIMESERIES interval to the largest value that still meets your analytical needs. For example, TIMESERIES 5 minutes is much faster than TIMESERIES 10 seconds.
SELECT average(duration) FROM Transaction WHERE appName = 'MyAwesomeApp' AND name = 'WebTransaction/Action/POST /users' AND httpResponseCode IS NOT NULL SINCE 1 hour ago TIMESERIES 5 minutes FACET httpResponseCode
Why it works: Fewer time buckets mean less computation and aggregation required by New Relic.
6. Understand Data Retention and Rollups:
New Relic doesn’t store raw event data indefinitely. Older data is often rolled up into summary metrics. While NRQL generally handles this transparently, very long time ranges can sometimes hit performance cliffs as it transitions from indexed events to aggregated data.
Diagnosis: Queries covering extremely long time ranges (e.g., months or years) that are unexpectedly slow, even after applying other optimizations.
Fix: If possible, limit your queries to shorter, relevant time windows. If you need long-term trending, consider if pre-built dashboards or custom aggregations can serve your purpose.
Why it works: Querying pre-aggregated data is inherently faster than processing raw, individual events.
By understanding these principles and applying them systematically, you can transform slow, resource-intensive NRQL queries into fast, efficient ones that provide insights without overwhelming your system. The next hurdle you’ll likely encounter is managing the complexity of very large result sets when your queries are too fast and return millions of data points.