New Relic’s streaming alerts evaluate alert conditions not by polling at intervals, but by processing data as it arrives, allowing for near-instantaneous detection of issues.
Let’s see this in action. Imagine we have a service that’s supposed to respond within 500 milliseconds. We want to know immediately if this latency spikes.
# Example of a slow response (simulated)
def process_request(request_data)
# ... some processing ...
sleep(0.6) # Simulate a 600ms delay
# ... send response ...
end
In New Relic, we’d set up a NRQL alert condition. Instead of FROM Transaction SELECT average(duration) FACET appName WHERE appName = 'MyAwesomeApp' SINCE 5 minutes ago, which would check every minute, we’d configure it for streaming evaluation.
Here’s how a streaming alert condition might look in the UI or API:
- Condition Name: High Latency Alert
- NRQL Query:
FROM Transaction SELECT average(duration) WHERE appName = 'MyAwesomeApp' AND transactionType = 'Web' - Evaluation Frequency: Streaming
- Evaluation Window: 1 minute (This defines the lookback for the NRQL, but the evaluation itself is continuous)
- Critical Threshold:
above 500milliseconds - For:
1 minute(This means the condition must be true for a full minute of streaming data before triggering, preventing flapping)
The key here is "Streaming." When New Relic receives a Transaction event with a duration of 700ms, it doesn’t wait for the next polling cycle. It immediately checks if this event, or a series of recent events, violates the average(duration) > 500 rule within the defined streaming window. If the average(duration) over the last minute of processed data exceeds 500ms, and this state persists for the For: 1 minute duration, the alert fires.
The core problem streaming alerts solve is the inherent latency in traditional polling-based alerting. If your polling interval is 1 minute, and an incident occurs 10 seconds into that minute, you won’t detect it for another 50 seconds. Streaming alerts ingest and evaluate data points as they arrive, often within seconds of the event occurring. This is crucial for high-velocity systems or those with very short tolerance for downtime.
Internally, New Relic uses a time-series database and a stream processing engine. When data arrives, it’s immediately fed into these processing pipelines. Alert conditions configured for streaming are attached to these pipelines. As new data points flow through, the engine evaluates the NRQL against a sliding window of the most recent data. This avoids the need to schedule discrete query executions.
The "Evaluation Window" in a streaming alert is not the same as the polling interval in a non-streaming alert. For streaming, it defines the lookback period for the NRQL aggregation (e.g., SINCE 1 minute ago). The evaluation happens continuously on data arriving within that window. So, if your window is 1 minute, New Relic is constantly assessing the NRQL against the last 60 seconds of received and processed data, not waiting for a scheduled check.
A common point of confusion is the interaction between the "For" duration and the "Evaluation Window." If you have For: 5 minutes and an evaluation window of 1 minute, the system needs to see the condition (average(duration) > 500) true for five consecutive minutes of streaming data. If the condition drops below the threshold for even a moment within that 5-minute window, the timer resets. This ensures that transient spikes don’t trigger alerts, while sustained issues are caught.
The real power of streaming alerts lies in their ability to detect and alert on ephemeral issues that might be missed by polling. Consider a brief but severe denial-of-service attack. A polling alert might miss the entire event, or only catch the tail end. A streaming alert, processing data second by second, would likely detect the anomalous traffic or error rates almost immediately.
When setting up streaming alerts, pay close attention to the For duration. It’s the guardian against alert fatigue. A For: 0 minutes setting with streaming would mean an alert fires on the very first data point that violates the condition, which is rarely desirable. The For duration ensures the issue is persistent enough to warrant attention.
The next challenge you’ll likely encounter is understanding how to fine-tune the NRQL for complex scenarios, such as correlating multiple metrics or identifying anomalies beyond simple thresholds.