Grafana’s State Timeline panel is a surprisingly powerful way to visualize the discrete state changes of your systems over time, rather than just continuous metrics.
Imagine you have a service that can be in one of several states: "running," "restarting," "degraded," or "stopped." A regular graph would show these as jagged lines, maybe with arbitrary numbers representing the states. The State Timeline panel, however, draws distinct horizontal bars for each state, clearly showing when the system was in that particular state and how long it stayed there.
Here’s a quick example using Prometheus as a data source. Let’s say you have a metric service_state that outputs 1 for "running", 2 for "restarting", 3 for "degraded", and 0 for "stopped".
First, set up your Grafana panel. Choose the "State timeline" visualization. For the data source, select your Prometheus instance. In the query editor, you’ll want to fetch the state metric and translate the numerical values into meaningful labels. A PromQL query like this works well:
label_replace(
service_state,
"state_label",
case $1
when "1" then "running"
when "2" then "restarting"
when "3" then "degraded"
when "0" then "stopped"
else "unknown"
end,
"state_label",
"$1"
)
This query takes the service_state metric. label_replace is used here to create a new label called state_label based on the value of the existing metric. The case statement maps the numerical values (1, 2, 3, 0) to human-readable strings.
In the panel’s "Field configuration" settings, under "Overrides," you’ll want to configure how the states are displayed.
Find the field that corresponds to your state_label.
Under "Value mappings," create mappings for each state:
- Value:
running-> Color:Green - Value:
restarting-> Color:Orange - Value:
degraded-> Color:Yellow - Value:
stopped-> Color:Red - Value:
unknown-> Color:Grey
The "Display name" for this field should be set to something descriptive like "Service Status."
Now, when you look at the panel, you’ll see a timeline. Each horizontal bar represents a period where the service was in a specific state, color-coded according to your configuration. You can zoom in and out, pan across time, and immediately see the history of your service’s operational status. This is invaluable for understanding the reliability of your systems, spotting recurring issues, or seeing the impact of deployments.
The real magic happens when you realize you don’t need a single numerical metric. You can use any metric that has distinct, meaningful values. For instance, you could track the state of a Kubernetes deployment (e.g., "Progressing," "Available," "ReplicaFailure") or the status of a network connection ("Up," "Down," "Intermittent"). The key is to have a way to represent discrete states.
What many people miss is how to effectively represent "no data" or "transient states" in the State Timeline. If your metric stops reporting for a period, the timeline will simply show a gap. However, you can often use alerts or specific alert states within your monitoring system to represent these conditions. For example, if a service is expected to report its status every minute, and it fails to do so, an alert could trigger a "heartbeat_lost" state that your State Timeline can then visualize. This allows you to see periods of silence as actual, observable states.
The next step is to start correlating these state changes with other continuous metrics, perhaps by overlaying a standard graph of CPU utilization or request latency below the State Timeline.