Loki’s volume dashboard in Grafana is surprisingly absent from the standard Grafana installation, leaving many to assume it’s not a core feature when in reality, it’s a powerful, often overlooked observability tool.
Let’s see it in action. Imagine you’ve got Loki set up and are shipping logs from a few applications. You’ve got Grafana running, and you want to visualize how much data Loki is actually ingesting over time. This isn’t just about knowing "how much" but understanding where it’s coming from and what kind of data it is.
Here’s a typical scenario: you’re debugging a performance issue and suspect a surge in log volume from a specific microservice might be the culprit. Without the volume dashboard, you’d be digging through individual logs, trying to infer the rate, which is like trying to understand a traffic jam by interviewing one driver.
The Loki volume dashboard, when properly configured, gives you this high-level, yet granular, view. It typically leverages Loki’s internal metrics, which are exposed via Prometheus, and then visualizes them in Grafana.
The core of this visualization relies on a few key Loki internal metrics:
loki_ingester_batch_send_errors_total: Errors during batch sending to storage.loki_ingester_received_bytes_total: Total bytes received by the ingester.loki_ingester_received_chunks_total: Total chunks received by the ingester.loki_ingester_uncompressed_bytes_total: Total uncompressed bytes received.loki_log_errors_total: Errors encountered while processing logs.loki_object_storage_writes_bytes_total: Bytes written to object storage.
To build this dashboard, you’d typically:
-
Ensure Loki is exposing metrics: Your Loki configuration (
loki-local.yamlor similar) needs to have ametricssection enabled, usually exposing onlocalhost:9095or a dedicated metric port.metrics: prometheus_client: enabled: true listen_address: "0.0.0.0:9095" -
Scrape Loki metrics with Prometheus: Your Prometheus configuration needs a job to scrape these metrics.
scrape_configs: - job_name: "loki" static_configs: - targets: ["loki.example.com:9095"] # Replace with your Loki metric endpoint -
Add Loki data source to Grafana: In Grafana, add a new data source, select "Prometheus", and point it to your Prometheus server’s URL.
-
Import or build the dashboard: You can often find pre-built Loki dashboards on Grafana.com’s dashboard repository (search for "Loki" or "Loki monitoring"). If not, you can build one by adding panels and using PromQL queries against the Loki metrics.
A common PromQL query to visualize the rate of received bytes per tenant would look like this:
rate(loki_ingester_received_bytes_total{job="loki"}[5m])
To break this down by tenant, you’d add a by (tenant) clause:
sum by (tenant) (rate(loki_ingester_received_bytes_total{job="loki"}[5m]))
This query takes the counter loki_ingester_received_bytes_total, calculates the per-second rate of increase over the last 5 minutes ([5m]), and then sums it up for each unique tenant label. This gives you a clear picture of which tenants are sending the most data.
You can create similar panels for loki_ingester_received_chunks_total, loki_object_storage_writes_bytes_total, and even loki_log_errors_total to get a comprehensive view of Loki’s operational health and data flow.
The power of this dashboard isn’t just in seeing raw numbers; it’s in correlating these metrics with your application logs. If you see a spike in loki_ingester_received_bytes_total for a specific tenant, you can then pivot to querying logs for that tenant in Loki, using the time range identified by the dashboard spike, to pinpoint the exact application or log source causing the surge.
One subtle but critical aspect of Loki’s volume handling is its chunking mechanism. Loki doesn’t store individual log lines directly. Instead, it groups them into compressed chunks. The loki_ingester_received_chunks_total metric, when analyzed alongside loki_ingester_received_bytes_total, can reveal information about the average chunk size and the effectiveness of compression, both of which have direct implications on storage costs and ingestion performance. A sudden increase in the number of chunks without a proportional increase in bytes might indicate very small log messages or inefficient compression, while a large increase in bytes with few new chunks suggests larger, well-compressed data.
Once you’ve got the volume dashboard showing you what’s happening with your log data ingestion rates, the next logical step is to understand why. This often leads to exploring Loki’s retention policies and how they interact with your data volume.