Applied Intelligence is actually a sophisticated anomaly detection system that works by learning your system’s normal behavior and flagging deviations, rather than just waiting for hard-coded thresholds to be breached.

Let’s see it in action. Imagine you’re monitoring a web service. Normally, its request latency hovers around 100ms, and error rates are below 0.1%.

// Simulated New Relic NRQL query to show normal behavior
SELECT average(duration) AS 'Avg Latency', average(errorRate) AS 'Error Rate'
FROM Transaction SINCE 1 week ago UNTIL now
WHERE appName = 'MyWebService'
FACET dateOf(timestamp)

This query would typically show a relatively flat line for latency and a very low, stable line for error rate. Now, let’s introduce a subtle, prolonged issue – maybe a background job starts consuming more CPU, causing occasional spikes in latency and a slight, almost imperceptible rise in error rates that wouldn’t trigger a traditional alert.

// Simulated New Relic NRQL query showing a subtle deviation
SELECT average(duration) AS 'Avg Latency', average(errorRate) AS 'Error Rate'
FROM Transaction SINCE 1 hour ago UNTIL now
WHERE appName = 'MyWebService'
FACET dateOf(timestamp)

Applied Intelligence, having learned the baseline, would detect this deviation. It wouldn’t fire an alert for a single spike, but it would notice the pattern of increased latency and errors, even if they don’t cross a static threshold. This is where it shines – identifying emergent problems before they become critical. It’s not just about "is this metric red?"; it’s about "is this metric behaving differently than it normally does?"

The core problem Applied Intelligence solves is alert fatigue. Traditional alerting relies on static thresholds (e.g., "alert if latency > 500ms"). This is brittle. Your system’s "normal" changes. A peak holiday season might legitimately increase traffic and latency, causing false positives. Conversely, a slow degradation over time might never cross a static threshold, leading to missed incidents. Applied Intelligence addresses this by dynamically learning your system’s behavior. It establishes a baseline for various metrics (latency, throughput, error rates, CPU usage, etc.) over time. When a metric deviates significantly from its learned baseline, it flags it. This deviation can be a sudden spike, a gradual drift, or even a change in the pattern of fluctuations.

Internally, it uses statistical models, often involving time-series analysis and anomaly detection algorithms. For each monitored metric, it builds a profile of its typical behavior, considering factors like time of day, day of week, and even seasonality. When new data comes in, it compares it against this profile. If the deviation is statistically significant and persistent enough, it generates an event. This event can then be used to trigger a workflow, such as creating an incident in your incident management system, sending a notification, or even initiating automated remediation.

The key levers you control are primarily through configuration and feedback. You can:

  1. Enable/Disable Applied Intelligence for specific services or metrics: You choose which parts of your system are worth applying this intelligence to.
  2. Set sensitivity levels: You can fine-tune how aggressively Applied Intelligence flags deviations. Higher sensitivity means more alerts, lower means fewer but potentially less timely.
  3. Provide feedback: Crucially, you can tell Applied Intelligence whether an alert it generated was a "good" incident (a real problem) or a "bad" one (a false positive). This feedback loop helps the models refine their understanding of your system’s normal behavior over time, making them more accurate.
  4. Configure alert routing and escalation: Like traditional alerts, you define who gets notified and when.

A common misconception is that Applied Intelligence simply replaces static thresholds. It doesn’t. It complements them. The system often has a multi-layered approach where both static threshold breaches and significant anomalies detected by Applied Intelligence can trigger alerts. Furthermore, the anomalies detected by Applied Intelligence are often correlated with other events, allowing it to provide more context than a simple "metric X is high" alert. For instance, it might detect an anomaly in database query latency and simultaneously correlate it with an anomaly in application error rates, pointing to a more systemic issue.

The next step after leveraging Applied Intelligence for anomaly detection is to explore its capabilities in correlating these anomalies with other system events to pinpoint root causes more efficiently.

Want structured learning?

Take the full Newrelic course →