Loki’s parse and transform stages are where the real magic happens, turning raw, unstructured log lines into queryable data.

Let’s see this in action with a common scenario: logs from a web server that include the request method, path, status code, and response size.

{
  "log_line": "192.168.1.10 - - [10/Oct/2023:13:55:36 -0700] \"GET /api/v1/users?id=123 HTTP/1.1\" 200 1024 \"-\" \"curl/7.68.0\""
}

We want to extract method, path, status, and size and make them searchable.

Here’s a Loki configuration snippet using logcli to demonstrate:

scrape_configs:
  - job_name: myapp
    static_configs:
      - targets:
          - localhost
        labels:
          job: myapp
          __path__: /tmp/myapp.log
    pipeline_stages:
      - regex:
          expression: '^(?P<ip>\\S+) \\S+ \\S+ \\[(.+?)\\] \\"(?P<method>\\S+) (?P<path>\\S+) (?P<protocol>\\S+)\\" (?P<status>\\d+) (?P<size>\\d+) \\".*$'
      - labels:
          method:
          status:
          size:
      - match:
          selector: '{job="myapp",status="404"}'
          stages:
            - metrics:
                requests_total:
                  type: counter
                  value: 1
                  labels:
                    method:
                    status:
      - output:
          source: log_line
          format: json

The pipeline_stages block is key. It’s a sequence of operations applied to each log line.

First, the regex stage. This is your primary tool for breaking down unstructured text. We use a regular expression to capture different parts of the log line into named fields. The (?P<name>...) syntax is crucial for naming these captured groups. In our example, we’re extracting ip, method, path, protocol, status, and size.

Next, the labels stage. This takes the fields extracted by the regex (or other parsing stages) and promotes them to Loki labels. Labels are indexed and are what you use for efficient filtering in queries. So, method, status, and size will now be searchable. If you wanted to filter by IP address, you’d add ip: to this stage.

The match stage allows for conditional processing. Here, we’re saying "if the log line matches the selector {job="myapp",status="404"}," then apply the following stages. This is incredibly powerful for routing specific logs to different processing paths or for triggering actions.

Inside the match stage, we have a metrics stage. This is how you generate metrics from your logs. We’re creating a counter called requests_total that increments by 1 for every log line matching the 404 status. We’re also associating method and status as labels for this metric. This means you can query sum(requests_total{method="GET"}) by (status) to see how many GET requests resulted in each status code.

Finally, the output stage. This determines what gets stored in Loki. By default, Loki stores the original log line. output allows you to reformat or select what is persisted. Here, we’re explicitly stating we want to output the log_line and specifying it as json format, though in this specific example with the regex, the original log_line variable is what holds the parsed data. The source: log_line is important because the regex stage populates a field named log_line by default if you don’t specify otherwise in the regex itself.

The most surprising true thing about Loki’s pipeline is that the regex stage, while powerful, is often the bottleneck and can be surprisingly difficult to get right for complex, inconsistent log formats. Many teams end up writing custom parsers or using more sophisticated tools like Fluentd or Vector to pre-process logs before they even hit Loki’s pipeline.

The next concept you’ll likely dive into is how to handle more complex data structures within logs, often requiring the json or template stages to extract nested fields or construct new log messages.

Want structured learning?

Take the full Loki course →