Grafana can actually ingest and visualize Jaeger traces, letting you correlate your distributed traces with your metrics and logs in a single pane of glass.
Let’s see it in action. Imagine you have a microservice architecture where requests flow through several services: frontend -> auth-service -> user-service -> db-service. If a user reports a slow login, you can dive into the traces to pinpoint exactly which service is causing the delay.
Here’s a typical trace from Jaeger, showing the request path and timings:
Trace ID: a1b2c3d4e5f67890
Span ID: 1234567890abcdef
Operation: frontend.handleRequest
Duration: 150ms
Tags: http.method=GET, http.status_code=200
Span ID: fedcba0987654321
Operation: auth-service.authenticateUser
Duration: 100ms
Tags: user.id=alice
Span ID: 1a2b3c4d5e6f7890
Operation: user-service.getUserProfile
Duration: 70ms
Tags: user.id=alice
Span ID: 9876543210fedcba
Operation: db-service.queryUser
Duration: 50ms
Tags: db.statement="SELECT * FROM users WHERE id='alice'"
To get this into Grafana, you’ll need a Jaeger backend running (e.g., a Jaeger all-in-one or collector deployed via Kubernetes) and Grafana configured to talk to it.
First, ensure your services are instrumented to send traces to your Jaeger agent or collector. Most tracing libraries (OpenTelemetry, Jaeger clients) allow you to configure the endpoint, often something like http://jaeger-collector.tracing.svc.cluster.local:14268/api/traces.
In Grafana, you’ll add a new Data Source. Choose "Jaeger" from the list. The crucial configuration here is the "Jaeger API URL". This should point to your Jaeger collector’s API. For a typical Kubernetes deployment, this might look like: http://jaeger-collector.tracing.svc.cluster.local:14268/api/traces. You can also specify a "Search Max Duration" (e.g., 1h) and "Trace storage type" (usually elasticsearch or cassandra, depending on your Jaeger backend).
Once saved, you can go to the "Explore" view in Grafana, select your Jaeger data source, and start querying for traces. You can search by trace ID, service name, operation name, or tags. For example, to find all traces for the auth-service that took longer than 50ms, you might use a query like: service="auth-service" duration > 50ms.
Grafana will then display a list of matching traces. Clicking on a trace ID will reveal a waterfall diagram, much like the Jaeger UI, showing the timeline of operations across your services. This visualization is key: the wider the bar, the longer that specific operation took. You can immediately spot bottlenecks. If the auth-service.authenticateUser bar is significantly wider than others, you know that’s where the problem lies.
The power comes from combining this with other data. You can add a Grafana panel displaying metrics for the auth-service (e.g., CPU usage, request rate) and another panel showing logs from the same service, all filtered to the same time range as your trace. If you see a spike in CPU usage on auth-service correlating with a slow trace, you’ve got a strong lead.
Grafana also allows you to "add to dashboard" from the Explore view. This means you can build dashboards that not only show your metrics and logs but also include a Jaeger trace panel, allowing users to drill down into specific traces directly from their dashboard. You can configure this panel to automatically query traces based on the dashboard’s time range and any applied variables (like a service variable).
The most surprising thing most people miss is how deeply you can correlate. It’s not just about seeing a trace and then manually finding metrics for that service. Grafana’s trace panel can be configured to automatically link from a metric anomaly or a log error directly to the relevant trace. You can set up "trace to logs" and "trace to metrics" links within the trace view itself, or configure panels to filter traces based on selected logs or metric data points. This seamless transition is where the real operational insight emerges.
The next step is exploring how to build custom trace queries with specific conditions and aggregations.