The most surprising thing about Service Level Objectives (SLOs) is that they aren’t primarily about setting targets; they’re about defining what "good enough" actually means for your users, and then proving it.

Let’s see this in action. Imagine you’re running a critical e-commerce checkout service. Your users expect this to be fast and available.

{
  "name": "Checkout Service Availability",
  "description": "Ensures users can successfully complete a purchase.",
  "service_level_indicator": {
    "type": "error_budget",
    "error_budget_policy": {
      "budget_policy": [
        {
          "budget_allowance": 50,
          "cost_type": "percentage",
          "threshold_type": "percentage",
          "target": 99.9,
          "time_window": {
            "duration": 2592000,
            "unit": "second"
          }
        }
      ]
    }
  },
  "warning_threshold": {
    "metric": "error_budget",
    "value": 50
  },
  "critical_threshold": {
    "metric": "error_budget",
    "value": 100
  }
}

This JSON defines an SLO for "Checkout Service Availability." The service_level_indicator specifies error_budget, meaning we’re tracking how much availability we have left. The budget_policy is the core:

  • target: 99.9 is our desired availability.
  • time_window: 2592000 seconds (30 days) is the period over which we measure this 99.9%.
  • budget_allowance: 50 means if our availability drops to 99.85% (50% of the error budget used), we hit a warning.
  • critical_threshold: 100 means if we use 100% of our error budget (availability drops below 99.9%), we’ve failed the SLO.

New Relic, when configured with appropriate data sources (like APM transaction data, or custom metrics), will continuously calculate your error budget burn rate. It looks at your successful transactions versus your failed ones (based on error rates you’ve defined in your APM agent or custom instrumentation) over that 30-day window. If, for example, you have 1,000,000 transactions in 30 days, 99.9% availability means you can tolerate up to 1,000 errors. If you exceed that, your error budget is depleted.

The problem SLOs solve is the ambiguity around "performance." Is 99.9% availability "good enough"? For a user trying to buy a gift on Black Friday, probably not. But for an internal admin dashboard that’s only used during business hours, maybe 99% is fine. SLOs force you to quantify this user-centric view. They shift the conversation from "fix that bug!" to "how much of our error budget did that bug consume, and what does that mean for our users?" This allows engineering teams to make informed trade-offs. If you have plenty of error budget left, you might prioritize a new feature. If you’re burning through it, you might need to halt new development and focus solely on reliability.

The real power emerges when you integrate this with your alerting. New Relic can trigger alerts based on the warning_threshold and critical_threshold. A warning might trigger a Slack message to the on-call engineer, prompting them to investigate potential issues. A critical alert might escalate to a higher severity, demanding immediate attention and potentially triggering an incident response. This isn’t just about knowing you’re at a certain availability; it’s about understanding your trend and acting proactively before users even notice a problem.

The most subtle aspect of SLOs is how they dictate your release velocity. If your SLO target is 99.99% availability over 7 days, and your current performance is 99.95%, you have a very small error budget. This means that any new deployment carrying even a small risk of introducing downtime or errors must be meticulously tested and potentially rolled back quickly if it impacts the SLO. Conversely, if you’re consistently exceeding your SLO with a large error budget, you have the "permission" to move faster, knowing you have the capacity to absorb minor disruptions.

Once you’ve mastered defining and tracking availability SLOs, the next logical step is to apply the same principles to latency.

Want structured learning?

Take the full Newrelic course →