Grafana Explore mode is where you go when you need to answer a question that isn’t already captured by a dashboard.

Imagine you’re trying to debug a sudden spike in user-reported errors. Your dashboard shows overall error rates, but it doesn’t tell you which specific API endpoint is failing or what unique user ID is associated with the problem. That’s where Explore comes in. You can dive into the raw logs and metrics, filter them down to the exact time window and conditions you’re interested in, and pinpoint the root cause.

Here’s a typical workflow. Let’s say we want to see all logs from our auth-service that contain the word "failed" between 10:00 AM and 10:15 AM today.

First, you’d navigate to the Explore section in Grafana. On the left, you’ll see a data source picker. For logs, you’re likely using something like Loki, Elasticsearch, or Tempo. For metrics, you might be using Prometheus or InfluxDB. Let’s assume we’re using Loki for logs.

Once Loki is selected, you’ll see a query editor. For Loki, the query language is LogQL. To get logs from auth-service, you’d start with a label selector:

{app="auth-service"}

This fetches all log lines where the label app is set to auth-service.

Now, we want to filter for the word "failed" and restrict the time range. Explore has a time range picker at the top right, similar to dashboards. You can set it to "Last 15 minutes" or manually input the start and end times. For this example, let’s set it to 10:00 AM to 10:15 AM.

To filter for "failed" within those logs, you add a text filter to your LogQL query:

{app="auth-service"} |~ "failed"

The |~ operator means "matches regex", and "failed" is the pattern. If you wanted an exact string match, you’d use |="failed".

After you hit "Run query" (or press Enter), Grafana will fetch the matching log lines from Loki and display them in the main panel. You’ll see the timestamp, the log message itself, and any associated labels.

From here, you can refine your query. Maybe you want to see which user ID is associated with these failures. If your logs contain a user_id label, you could add that to the query:

{app="auth-service", user_id=~".+"} |~ "failed"

This adds a filter for logs that have a user_id label set to anything (.+). Now, the results might show multiple user IDs.

To aggregate this and see how many failures per user, you’d switch to a metric query or use Loki’s aggregation functions. For instance, to count failed requests per user:

count by (user_id) ({app="auth-service"} |~ "failed")

This query returns a time series where each series is a user_id and the value is the count of log lines matching the criteria for that user. Explore can then display this as a graph or a table, helping you identify if a single user is causing a disproportionate number of errors.

The real power comes from combining log and metric queries. You can have multiple queries running simultaneously in Explore. For example, you could have one panel showing the raw error logs for auth-service and another panel showing the Prometheus metric for the overall request rate of auth-service during the same time window. This allows you to correlate the log events with system performance.

Here’s a Prometheus query to see the request rate for auth-service:

rate(http_requests_total{job="auth-service"}[5m])

This query counts the number of HTTP requests per second (rate) for the auth-service over a 5-minute window ([5m]).

By switching between "Logs" and "Metrics" tabs within a single Explore panel, or by adding multiple queries to the same panel, you can build a comprehensive picture of what’s happening. You can also inspect individual log lines to see the full context, including any additional structured data within the log message itself, which might be JSON or key-value pairs.

One thing that trips people up is the distinction between filtering in the query itself versus filtering using the UI controls. The query editor is for defining what data to fetch from the data source. The filter boxes that appear below the query editor, or the dropdowns for labels, are often for refining the data after it’s been fetched, or for pre-selecting common labels. For example, in Loki, you can type {app="auth-service"} and then use the UI to add a level="error" filter. This is often equivalent to {app="auth-service", level="error"}, but the UI can be faster for common labels. However, for complex text matching or regex, it’s best to put it directly in the LogQL or PromQL query.

The next step after you’ve identified a pattern of errors in Explore is to create a new dashboard panel or update an existing one to track this specific issue proactively.

Want structured learning?

Take the full Grafana course →