Grafana Tempo can tell you that a request took a long time, but it won’t tell you why unless you’ve instrumented your code to provide that context.

Let’s see Tempo in action. Imagine a user request comes into your web service. This request might fan out to several other services:

[request-id: abc123xyz] User requests /products/123
    -> WebService: receives request, starts trace
        -> AuthSvc: checks user permissions (span-id: auth456)
        -> ProductSvc: fetches product details (span-id: prod789)
            -> InventorySvc: checks stock (span-id: inv012)
            -> PricingSvc: gets price (span-id: price345)
        -> WebService: aggregates results, returns response

Tempo, when fed this trace data, visualizes it as a waterfall. Each box is a "span," representing an operation within your system. The width of the box is the duration of that operation. You can immediately see which service took the longest.

Here’s how you set up Tempo and get this data flowing.

First, you need a place for Tempo to store its trace data. Tempo is designed to be simple and often uses object storage like S3, GCS, or Azure Blob Storage. For local development, you can even use a local directory.

# tempo.yaml
auth_enabled: false
storage:
  trace:
    backend: "local"
    local:
      path: "/tmp/tempo/traces" # For local testing

Next, you need to tell your applications to send trace data to Tempo. This is done via OpenTelemetry. You’ll add OpenTelemetry SDKs to your services and configure an exporter to send data to your Tempo instance. Tempo typically listens on port 55681 for OTLP (OpenTelemetry Protocol) data.

Your application’s OpenTelemetry configuration might look something like this (example in Python):

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor

# Configure the tracer
provider = TracerProvider()
tracer = provider.get_tracer(__name__)

# Configure the OTLP exporter to send to Tempo
span_processor = BatchSpanProcessor(
    OTLPSpanExporter(endpoint="http://localhost:55681") # Tempo's OTLP endpoint
)
provider.add_span_processor(span_processor)
trace.set_tracer_provider(provider)

# Now, when you create spans, they will be sent to Tempo
with tracer.start_as_current_span("my_operation") as span:
    # ... do work ...
    pass

You also need to propagate trace context across service calls. When WebService calls AuthSvc, it must inject the current trace ID and span ID into the request headers. This is crucial for Tempo to link spans from different services into a single trace. OpenTelemetry libraries handle this context propagation automatically when you use their HTTP clients.

Once your services are instrumented and sending data, you configure Grafana to connect to Tempo. In Grafana, you go to "Configuration" -> "Data Sources" and add a new data source. Select "Tempo" and point it to your Tempo instance’s HTTP API. The default port is 3100.

HTTP URL: http://localhost:3100

Now, when you navigate to the "Explore" view in Grafana, you can select your Tempo data source. You’ll see a list of traces. To make tracing actionable, you need to add "service name" and "operation name" as common labels. This allows you to filter traces effectively.

For example, you can search for traces where service.name="WebService" and operation.name="HTTP GET /products/:id".

The drilldown capability comes from Grafana’s ability to correlate traces with metrics and logs. If a particular span in Tempo is taking a long time, you can click on it. Grafana will then allow you to jump to related metrics (e.g., CPU usage for that service at that time) or logs (e.g., application logs from that service instance during the trace’s duration). This is configured by setting up "Trace to Metrics" and "Trace to Logs" in your Grafana data source configurations.

For "Trace to Logs," you’ll specify a query that uses trace context (like traceID) to fetch logs from your logging backend (e.g., Loki). For "Trace to Metrics," you’ll specify a query that uses service.name and operation.name to fetch metrics from Prometheus.

The most surprising thing most people don’t realize about Tempo is that it doesn’t store trace data in a relational database; it stores it directly in object storage. This makes it incredibly scalable and cost-effective for high volumes of traces, as it avoids the complexities and costs of managing a large database cluster. Tempo only indexes trace IDs, service names, and span names, meaning you can’t directly query for arbitrary attributes within spans without an additional indexing layer like Elasticsearch.

Once you’ve mastered correlating traces with metrics and logs, the next step is to explore distributed tracing strategies like sampling and head-based vs. tail-based correlation.

Want structured learning?

Take the full Grafana course →