Anomaly detection in New Relic is less about spotting spikes and more about understanding the subtle, insidious drift away from normal behavior.

Let’s see this in action. Imagine you have a service, my-api, and you’re tracking its request rate. Normally, it hovers around 1000 requests per minute.

{
  "event_type": "SystemSample",
  "entityName": "my-api",
  "requestCount": 1005,
  "timestamp": 1678886400000
}

Here’s another normal reading:

{
  "event_type": "SystemSample",
  "entityName": "my-api",
  "requestCount": 998,
  "timestamp": 1678886460000
}

Now, what if over the next few hours, the request rate slowly creeps up, hitting 1200, then 1500, then 1800 requests per minute? A traditional threshold alert (e.g., requestCount > 1100) might not fire for a long time, or it might fire too late. Anomaly detection, however, sees that 1800 requests per minute is significantly unusual compared to the established baseline of 1000 requests per minute, even if it’s not a sudden spike.

The core problem anomaly detection solves is the "noise" problem. Most monitoring systems are good at detecting sudden, dramatic failures. They’re terrible at detecting slow degradations, subtle performance regressions, or unexpected changes in traffic patterns that, while not outright errors, indicate something is wrong. Think of it as a doctor who only notices if you suddenly go into cardiac arrest, but misses the fact your blood pressure has been subtly rising for months.

Internally, New Relic’s anomaly detection builds a dynamic baseline for your metrics. It doesn’t use static thresholds. Instead, it looks at historical data, considers seasonality (like daily or weekly patterns), and calculates a "normal" range. When a new data point falls outside this dynamically calculated range, it flags it as an anomaly. This range isn’t just a simple average +/- standard deviation; it uses more sophisticated statistical models to adapt to varying data patterns.

The primary lever you control is the metric you choose to monitor. You want to select metrics that are indicative of user experience or system health. For my-api, requestCount is good, but average_duration or errorRate are even better candidates for anomaly detection. You also control the sensitivity of the detection. New Relic allows you to configure how aggressive the detection is, essentially adjusting the width of that "normal" band. A tighter band will flag more deviations, potentially leading to more false positives but catching subtle issues sooner. A wider band will be more lenient, reducing noise but risking missing minor drifts.

The real power comes from combining anomaly detection with alerting policies. You can create an alert that fires only when an anomaly is detected on a specific metric for a certain duration. This allows you to get notified about unusual behavior without having to guess what a "normal" threshold should be. You can also set up different anomaly detection rules for different times of day or days of the week, accounting for predictable traffic fluctuations.

What most people don’t realize is that anomaly detection is not a single algorithm. New Relic employs multiple models, and the system dynamically chooses the most appropriate one based on the characteristics of the time series data it’s analyzing. For metrics with strong, predictable seasonality, it might use a model that explicitly accounts for that. For more erratic data, it might use a model that’s more robust to outliers. This self-tuning aspect is what makes it powerful without requiring deep statistical expertise from the user.

The next step is to understand how to tune these anomaly detection alerts to minimize false positives while maximizing signal.

Want structured learning?

Take the full Newrelic course →