Grafana histograms and heatmaps reveal that latency isn’t just an average; it’s a spectrum of user experiences, often hiding extreme outliers.

Let’s see this in action. Imagine we’re tracking request latency for a critical API endpoint. We’re sending metrics to Prometheus, which Grafana then queries.

Here’s a sample Prometheus metric definition:

http_request_duration_seconds_bucket{handler="/api/v1/users", method="GET", le="0.1"} 1500
http_request_duration_seconds_bucket{handler="/api/v1/users", method="GET", le="0.5"} 2200
http_request_duration_seconds_bucket{handler="/api/v1/users", method="GET", le="1.0"} 2350
http_request_duration_seconds_bucket{handler="/api/v1/users", method="GET", le="5.0"} 2390
http_request_duration_seconds_bucket{handler="/api/v1/users", method="GET", le="+Inf"} 2400

This tells us that out of 2400 GET /api/v1/users requests, 1500 completed in under 0.1 seconds, 2200 in under 0.5 seconds, and so on. The _bucket suffix and the le (less than or equal to) label are standard for Prometheus histogram metrics.

In Grafana, we’d set up a panel. For a histogram, we’d query Prometheus like this:

sum(rate(http_request_duration_seconds_bucket{handler="/api/v1/users", method="GET"}[5m])) by (le)

This query aggregates the http_request_duration_seconds_bucket metric over the last 5 minutes, summing up counts for each latency bucket (le). When visualized as a "Histogram" graph type in Grafana, it shows the cumulative distribution of latencies.

A heatmap panel offers a different perspective, showing density over time. The query might look similar, but the visualization aggregates counts into time-based buckets.

sum(rate(http_request_duration_seconds_bucket{handler="/api/v1/users", method="GET"}[5m])) by (le)

The heatmap would show time on one axis and latency buckets on the other, with color intensity representing the number of requests falling into that specific latency range at that specific time. This is where you spot those sudden spikes of slow requests that an average would completely miss.

The core problem these visualizations solve is the inadequacy of simple average latency. An average can be misleading because a few extremely slow requests can skew it upwards, while many fast requests might mask a small but critical percentage of users experiencing unacceptable delays. Histograms and heatmaps provide a granular view, allowing us to see the distribution of latencies.

Internally, Prometheus histograms work by observing individual request durations and then incrementing counters for predefined "buckets." For example, if a request takes 0.3 seconds, it increments the le="0.5" bucket (and all subsequent buckets). The _bucket metric name is a convention for these cumulative counts.

The key levers you control are:

  1. Bucket Configuration: When exposing metrics, you define the latency buckets. Common choices are linear (e.g., 0.01, 0.02, 0.05, 0.1, 0.5, 1.0, 5.0 seconds) or exponential (e.g., 0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12 seconds). The choice depends on the expected latency range and the granularity you need at different scales.
  2. Grafana Querying: You select the specific metric and apply aggregation functions (sum, rate) and time ranges. You can filter by labels (like handler, method, status_code) to isolate specific performance characteristics.
  3. Grafana Visualization Settings: For histograms, you can choose cumulative or relative distributions. For heatmaps, you control the color scheme, time grouping, and latency bucket display.

A critical, often overlooked detail is that the rate() function in Prometheus queries for histogram buckets calculates the per-second average rate of increase of the counter. When you apply rate() to a histogram bucket metric like http_request_duration_seconds_bucket, you’re not getting the rate of requests within that bucket; you’re getting the rate at which the cumulative count up to that bucket is increasing. To get the actual count of requests within a specific bucket (e.g., between 0.1s and 0.5s), you would typically subtract consecutive rate() results: rate(metric{le="0.5"}) - rate(metric{le="0.1"}). However, for direct visualization of the distribution, plotting rate(...) by (le) is standard, as Grafana’s histogram panel understands how to interpret these cumulative rates.

Once you’ve mastered visualizing latency distributions, the next step is correlating these latency patterns with other system behaviors, such as error rates or resource utilization.

Want structured learning?

Take the full Grafana course →