LogQL queries in Grafana Loki are performing slowly, and you’re looking for ways to speed them up.

The core issue is that Loki needs to scan through potentially massive amounts of log data to find what you’re looking for, and inefficient queries force it to do more work than necessary. This usually boils down to how you’re filtering and what data Loki has to sift through.

Common Causes and Fixes for Slow LogQL Queries

  1. Lack of or Inefficient Label Filtering:

    • Diagnosis: Start by looking at your query. Are you using label matchers ({app="my-app", env="prod"}) as the very first part of your query? If not, Loki has to scan more data before applying your filters. Run logcli query '{app!=""}' --since 1h | count_over_time() to see how many log streams exist without specific labels.
    • Fix: Always start with the most selective label matchers possible. For example, instead of app="my-app" | log_message="error", use {app="my-app"} | log_message="error". This tells Loki to find streams belonging to my-app first, then scan only those streams for the log message.
    • Why it works: Label matchers are indexed by Loki. By filtering on labels first, you’re directing Loki to a much smaller set of relevant log streams from the outset, drastically reducing the amount of data it needs to inspect for the actual log content.
  2. Scanning Large Time Ranges:

    • Diagnosis: Observe the time range selected in your Grafana dashboard or specified in your logcli query (e.g., --since 7d). If you’re querying for a very long period, Loki has more data to process. Check the total number of log lines returned for a broad query over a long period using logcli query '{app="my-app"}' --since 7d | count_over_time().
    • Fix: Narrow down the time range to the smallest practical window. If you need to analyze a long period, consider breaking it into smaller, manageable chunks, or use more aggressive filtering within that range.
    • Why it works: Loki stores index data (like labels and timestamps) separately from the log content. However, even with efficient indexing, retrieving and processing logs over extended periods inherently requires more I/O and CPU to fetch and decompress the relevant log chunks.
  3. Using line_format or json on Unfiltered Data:

    • Diagnosis: If your query uses | json or | line_format before applying strong label or content filters, Loki might be parsing JSON or formatting lines for every single log entry in the selected time range. Check query plans if available, or observe query duration with and without these early stages.
    • Fix: Apply | json or | line_format after you’ve narrowed down the results with label matchers or content filters. For example, {app="my-app", env="prod"} | log_message="error" | json.
    • Why it works: These operations require Loki to process the content of each log line. By deferring them until after filtering, you ensure they are only applied to the significantly reduced set of logs that match your initial criteria.
  4. Ineffective Content Filtering (Regex):

    • Diagnosis: If your query uses |~ "some_complex_regex" or !~ "another_regex" on a very broad set of logs, Loki has to perform expensive regular expression matching against potentially millions of log lines.
    • Fix:
      • Use exact string matching (| "error") or a limited set of keywords first, if possible.
      • If regex is necessary, ensure your regex is as specific as possible and anchored if appropriate.
      • Consider using logcli query --index=false ... if you suspect index lookups are slow for your specific pattern, though this is usually a last resort.
    • Why it works: Regular expression matching is computationally intensive. By narrowing down the log lines before applying a regex, or by making the regex itself more efficient, you reduce the number of times the regex engine needs to run.
  5. Overuse of sum by or count by on High-Cardinality Labels:

    • Diagnosis: Aggregations like sum by (user_id) or count by (request_id) can be slow if the label you’re aggregating by has a very high number of unique values (high cardinality). Loki has to collect and process distinct values.
    • Fix: If possible, aggregate by a lower-cardinality label or a combination of labels. If you need to aggregate by a high-cardinality label, try to pre-filter the data to a smaller time range or a more specific subset of logs. For example, sum by (user_id) ( {app="my-app"} | log_message="login_failed" ).
    • Why it works: Aggregations require Loki to maintain state for each unique label value. High cardinality means a vast number of states, increasing memory usage and processing time.
  6. Large Number of Log Streams:

    • Diagnosis: If you have thousands or millions of distinct log streams (e.g., every pod, every container, every service instance generating its own stream), even simple queries can become slow because Loki has to initialize and manage many stream readers. Use logcli query --since 1h | count_over_time() to get a sense of stream count.
    • Fix: Consolidate logs where possible. For instance, if you have many identical pods logging the same type of information, consider a single log stream for that service type. Review your logging agent configuration to ensure you’re not creating excessive, redundant streams.
    • Why it works: The overhead of managing thousands of individual stream handles and their associated metadata can significantly impact query performance, even if the total volume of log data isn’t excessive.
  7. Unoptimized Loki Configuration (Less Common for User Queries):

    • Diagnosis: While less common for direct query optimization, if your Loki instance itself is struggling, it can manifest as slow queries. Check Loki’s internal metrics for high CPU, memory, or disk I/O, especially during query execution. Look for query-frontend and query-scheduler metrics.
    • Fix: Ensure your Loki components (ingesters, queriers, indexers, object storage) are adequately resourced. For distributed Loki, ensure the query-frontend is enabled and configured correctly to distribute queries. Optimize object storage performance.
    • Why it works: A strained Loki infrastructure will naturally lead to slower query responses. Optimizing the underlying system ensures it can handle the load efficiently.

The next common hurdle you’ll encounter is understanding how Loki’s internal query scheduler and distributor work to parallelize queries across multiple queriers.

Want structured learning?

Take the full Grafana course →