Muting alerts during maintenance is less about silencing noise and more about surgically controlling the flow of information to prevent alert fatigue and ensure critical alerts aren’t lost in the din.

Let’s see what this looks like in practice. Imagine you’re rolling out a new version of your e-commerce API. You expect some temporary spikes in error rates or latency as the new code settles in. Instead of having your entire on-call team bombarded with alerts for the next hour, you can proactively mute them.

Here’s a simulated New Relic alert policy for your API’s error rate:

{
  "name": "API Error Rate High",
  "conditions": [
    {
      "type": "metric",
      "name": "API Errors",
      "enabled": true,
      "policy_id": 12345,
      "metric": {
        "metric_name": "HttpDispatcher.errors",
        "entity_name": "Production/API/v2",
        "duration_minutes": 5,
        "aggregation_method": "sum"
      },
      "terms": [
        {
          "operator": "above",
          "threshold": {
            "value": 10,
            "duration_minutes": 5
          },
          "priority": "critical"
        }
      ]
    }
  ],
  "incident_preference": "PER_POLICY"
}

And here’s how you’d create a mute for this specific policy during your maintenance window, say, from 2:00 AM to 3:00 AM UTC on January 15th:

# Using the New Relic CLI (assuming you have it installed and configured)
newrelic alerts muting-rule create \
  --name "API Maintenance Mute" \
  --description "Silencing API errors during v2 deployment" \
  --schedule-start "2023-01-15T02:00:00Z" \
  --schedule-end "2023-01-15T03:00:00Z" \
  --policy-id 12345 \
  --condition-id "all" # Mutes all conditions within the policy

This command tells New Relic: "For policy ID 12345, between the specified start and end times, do not create incidents for any of its conditions." The condition-id "all" is key here; you could also specify individual condition IDs if you only wanted to mute certain checks within a policy.

The real power of muting isn’t just stopping notifications. It’s about managing your team’s attention. When an alert is muted, New Relic still records the violation data. You can see that the error rate did go above 10 requests per minute during that hour. The difference is that no incident was created, and more importantly, no notification was sent to your Slack channel or PagerDuty. This prevents a flood of "false positive" alerts during a planned change, allowing your team to focus on the actual deployment or rollback if something truly goes wrong that isn’t covered by the muted alerts.

The mental model here is that your alert policies define what is considered an issue, and muting rules define when those issues should be ignored. They are separate, but coordinated, controls. You can also mute based on entities, tags, or even specific alert condition names, giving you granular control. For instance, if you’re only updating a specific microservice tagged service:user-auth, you could mute alerts specifically for entities with that tag, while still receiving alerts for other services.

When creating a mute rule, you can specify a schedule-start and schedule-end. This is crucial for ensuring the mute is temporary and automatically expires. Without an end time, the mute would persist indefinitely, which is usually not the desired outcome. You can also create "one-time" mutes for unscheduled events or "recurring" mutes for predictable maintenance windows.

One detail that often trips people up is that muting rules apply to newly opened incidents. If an incident was already open before the mute rule became active, it will continue to be active and send notifications until it resolves or is manually closed. Muting is a preventative measure for future issues, not a way to retroactively stop ongoing alerts.

After your maintenance window closes and your mute rule expires, the next thing you’ll likely want to check is the alert violations that occurred during the muted period.

Want structured learning?

Take the full Newrelic course →