The Grafana Node Graph panel can visualize service dependencies, but its true power lies in how it surfaces unseen or assumed connections, often revealing critical infrastructure blind spots.
Let’s see it in action. Imagine a simple web service, frontend-web, that talks to a user-service and a product-service. Both of those services talk to a database service.
Here’s a peek at what the raw data feeding a Node Graph might look like. This is a Prometheus query, showing requests between services:
http_requests_total{job="frontend-web", handler="/users"}
http_requests_total{job="frontend-web", handler="/products"}
http_requests_total{job="user-service", handler="/db"}
http_requests_total{job="product-service", handler="/db"}
When you configure the Node Graph panel, you tell it how to interpret this data. You might set it up to look at http_requests_total and use the job label as the "from" node and the instance label as the "to" node. You can also use other labels to group or color nodes, like service or namespace.
Here’s a simplified Grafana Node Graph configuration for this scenario:
Panel Options:
- Data Source: Prometheus
- Query:
(This query sums up request rates, grouping by the sourcesum by (from_job, to_job) (rate(http_requests_total{job=~"frontend-web|user-service|product-service"}[5m]))joband destinationjob.) - Node ID:
from_job - Target ID:
to_job - Node Group By:
from_job - Edges:
- From:
from_job - To:
to_job - Value:
Value(the rate of requests) - Display Label:
from_job -> to_job
- From:
When this panel renders, you’d see nodes like frontend-web, user-service, product-service, and database. You’d see an arrow from frontend-web to user-service and frontend-web to product-service. Then, arrows from user-service to database and product-service to database. The thickness of the arrows would represent the volume of traffic.
This visualization is powerful because it moves beyond static diagrams. It’s a live, breathing representation of your system’s actual communication patterns. You can drill down, zoom in, and see which services are talking to which, and how much.
The real magic happens when you start layering more data. What if user-service also calls an external auth-service? Your Prometheus metrics would need to capture that, perhaps with a different job label like auth-service. The Node Graph would then dynamically add an arrow from user-service to auth-service, showing this dependency without you having to manually update a diagram.
Consider a scenario where frontend-web is experiencing high latency. By looking at the Node Graph, you can immediately see the traffic flowing from frontend-web to user-service and product-service. If the arrow to user-service is thick and red (indicating high error rates or latency based on your configuration), you know your first investigation target. It’s not just about what is connected, but how healthy those connections are.
A common pitfall is relying solely on job labels. Many teams use job to represent the application and instance to represent a specific pod or VM. However, if you want to visualize dependencies between services and not just instances, you need to ensure your metrics are labeled appropriately to capture the destination service name in a consistent way. Often, this means using a label like target_service or deriving it from the URL if your tracing or metrics collection is sophisticated enough. Without this, you might end up with a graph showing individual pods talking to other individual pods, which is less useful for understanding service-level architecture.
The next step is to integrate tracing data, allowing you to see not just request volume but the actual latency breakdown per hop.