Loki’s line filters are the unsung heroes of fast log searching, letting you grep through massive log volumes with surprising speed.
Let’s watch this in action. Imagine you have a Loki instance and you’ve scraped logs from a few different services. We’ll simulate a scenario where we want to find all log lines containing "error" from our nginx service, but only those that also have a specific HTTP status code, say 500.
Here’s a typical nginx log line:
192.168.1.100 - - [10/Oct/2023:10:30:00 +0000] "GET /api/v1/users HTTP/1.1" 500 1234 "-" "Mozilla/5.0"
We can construct a LogQL query to filter this:
{job="nginx"} |= "error" | __line__ arith 500
Let’s break this down.
{job="nginx"} is our stream selector. This tells Loki to look for log streams where the job label is exactly nginx. This is the first stage of filtering, narrowing down the potential sources of our logs.
|= "error" is our first line filter. This is a standard LogQL filter that matches lines containing the substring "error". Loki applies this after it has identified the relevant streams.
| __line__ arith 500 is where the magic happens for structured data within the log line itself. __line__ is a special LogQL variable representing the raw log line content. The arith operator performs arithmetic comparisons. Here, we’re saying "the log line, when treated as a number (or when a number can be extracted from it), must be equal to 500". Loki is smart enough to parse numbers from the line and perform this comparison.
If you were to run this in Grafana’s Explore view, you’d see only the lines from nginx that contain "error" and have "500" present in the log line.
The problem Loki solves is the sheer cost and inefficiency of traditional log aggregation systems. Before Loki, you’d often send all logs to a central store, index everything, and then run searches. This meant huge storage costs, massive indexing overhead, and slow search times, especially as data grew. Loki’s approach, inspired by Prometheus, is to index only metadata (labels) and to perform filtering on the raw log content at query time. This makes it incredibly cost-effective and fast for common search patterns.
Internally, Loki stores logs in chunks, organized by stream (defined by labels). When you run a query, Loki first identifies the relevant streams using your label selectors. Then, it fetches the necessary chunks for those streams. For line filters like |= "error", Loki reads the raw log lines from the chunks and checks for the substring. For more advanced filters like __line__ arith 500, it attempts to parse numbers from the lines and perform the arithmetic operation. This "query-time processing" of the log content is what makes it so efficient, as you’re only processing the data you explicitly ask for.
The key levers you control are the label selectors and the line filters. Label selectors ({job="nginx", instance="server-1"}) are your primary tool for narrowing down the source of logs. Line filters (|=, !=, |~, !~, | json, | logfmt, | unpack, | regexp, | pattern, | __line__ arith) are for inspecting the content of the logs once you’ve found the right streams. The | json, | logfmt, and | unpack filters are particularly powerful for extracting structured data from logs, allowing for even more granular filtering on fields within those logs.
What most people don’t realize is how __line__ arith can be used to parse and compare any number within a log line, not just specific fields. For example, if you have a log line like Processing request took 150ms, you can filter for lines where the processing time was greater than 100ms with | __line__ arith > 100. Loki intelligently scans the line for numeric sequences to perform these comparisons, making it incredibly flexible for time-based or magnitude-based filtering directly on the raw text.
The next logical step after mastering line filters is to explore Loki’s ability to parse structured log formats like JSON and Logfmt directly within queries.