Loki’s regex expression allows you to extract arbitrary labels from your log lines, going beyond the basic log stream labels Loki uses for indexing.

Here’s a promtail configuration snippet demonstrating this:

scrape_configs:
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          __path__: /var/log/syslog
    pipeline_stages:
      - regex:
          expression: '^(?P<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z)\s+(?P<level>\w+)\s+\[(?P<thread>.*?)\]\s+(?P<logger>.*?)\s+-\s+(?P<message>.*)$'
      - labels:
          level:
          thread:
          logger:

In this example, promtail is configured to read /var/log/syslog. The regex stage uses a named capture group (?P<name>...) to extract parts of the log line. The labels stage then takes these captured groups and turns them into actual Loki labels.

Let’s break down the regex expression:

  • ^: Matches the start of the line.
  • (?P<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z): Captures a timestamp in ISO 8601 format and names it time.
  • \s+: Matches one or more whitespace characters.
  • (?P<level>\w+): Captures one or more word characters (letters, numbers, underscore) and names it level. This is intended for log levels like INFO, WARN, ERROR.
  • \s+\[(?P<thread>.*?)\]: Captures characters within square brackets and names it thread. The .*? is a non-greedy match.
  • \s+(?P<logger>.*?): Captures characters following the thread and names it logger.
  • \s+-\s+: Matches a space, a hyphen, and another space, acting as a delimiter.
  • (?P<message>.*): Captures the rest of the line as message.

After the regex stage, the labels stage explicitly lists which captured groups should become Loki labels. So, for a log line like:

2023-10-27T10:00:01.123Z INFO [main] com.example.App - Application started successfully.

Loki will ingest it with the following labels:

  • job: varlogs (from static_configs)
  • level: INFO (extracted by regex and labels stages)
  • thread: main (extracted by regex and labels stages)
  • logger: com.example.App (extracted by regex and labels stages)

You can then query Loki using these extracted labels, for example:

{job="varlogs", level="ERROR"}

This allows you to filter and aggregate logs based on structured information that wasn’t originally part of the log stream’s metadata.

The regex stage in promtail doesn’t just extract; it can also drop lines that don’t match your pattern if you configure it to do so. By default, if a line doesn’t match the regex, it’s still sent to Loki, but the named capture groups won’t be populated, and thus no new labels will be added for that line from the regex stage. If you want to only ingest lines that match your regex, you can add drop_if_not_matched: true to the regex stage.

The regex stage uses Go’s regexp package, which supports RE2 syntax. This means you can use features like named capture groups ((?P<name>...)), which are crucial for making your extracted labels meaningful and easy to reference. While powerful, complex regexes can impact promtail’s performance, so it’s worth profiling if you’re dealing with very high log volumes or intricate patterns.

When constructing your regex, remember that the entire log line is the input. If your log lines have variable formatting at the beginning or end, your regex needs to account for that, perhaps using optional groups or more flexible matching.

If you are extracting fields that might contain special characters that Loki’s query language uses (like =, {, }), you may need to be mindful of how those labels are used in queries. However, Loki generally handles label values well.

The next step after extracting labels is often to use them for richer alerting or dashboarding, allowing you to create Grafana alerts or panels that react to specific log levels or components without having to parse the message content itself in Grafana.

Want structured learning?

Take the full Loki course →