Loki’s regex expression allows you to extract arbitrary labels from your log lines, going beyond the basic log stream labels Loki uses for indexing.
Here’s a promtail configuration snippet demonstrating this:
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/syslog
pipeline_stages:
- regex:
expression: '^(?P<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z)\s+(?P<level>\w+)\s+\[(?P<thread>.*?)\]\s+(?P<logger>.*?)\s+-\s+(?P<message>.*)$'
- labels:
level:
thread:
logger:
In this example, promtail is configured to read /var/log/syslog. The regex stage uses a named capture group (?P<name>...) to extract parts of the log line. The labels stage then takes these captured groups and turns them into actual Loki labels.
Let’s break down the regex expression:
^: Matches the start of the line.(?P<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z): Captures a timestamp in ISO 8601 format and names ittime.\s+: Matches one or more whitespace characters.(?P<level>\w+): Captures one or more word characters (letters, numbers, underscore) and names itlevel. This is intended for log levels likeINFO,WARN,ERROR.\s+\[(?P<thread>.*?)\]: Captures characters within square brackets and names itthread. The.*?is a non-greedy match.\s+(?P<logger>.*?): Captures characters following the thread and names itlogger.\s+-\s+: Matches a space, a hyphen, and another space, acting as a delimiter.(?P<message>.*): Captures the rest of the line asmessage.
After the regex stage, the labels stage explicitly lists which captured groups should become Loki labels. So, for a log line like:
2023-10-27T10:00:01.123Z INFO [main] com.example.App - Application started successfully.
Loki will ingest it with the following labels:
job: varlogs(fromstatic_configs)level: INFO(extracted byregexandlabelsstages)thread: main(extracted byregexandlabelsstages)logger: com.example.App(extracted byregexandlabelsstages)
You can then query Loki using these extracted labels, for example:
{job="varlogs", level="ERROR"}
This allows you to filter and aggregate logs based on structured information that wasn’t originally part of the log stream’s metadata.
The regex stage in promtail doesn’t just extract; it can also drop lines that don’t match your pattern if you configure it to do so. By default, if a line doesn’t match the regex, it’s still sent to Loki, but the named capture groups won’t be populated, and thus no new labels will be added for that line from the regex stage. If you want to only ingest lines that match your regex, you can add drop_if_not_matched: true to the regex stage.
The regex stage uses Go’s regexp package, which supports RE2 syntax. This means you can use features like named capture groups ((?P<name>...)), which are crucial for making your extracted labels meaningful and easy to reference. While powerful, complex regexes can impact promtail’s performance, so it’s worth profiling if you’re dealing with very high log volumes or intricate patterns.
When constructing your regex, remember that the entire log line is the input. If your log lines have variable formatting at the beginning or end, your regex needs to account for that, perhaps using optional groups or more flexible matching.
If you are extracting fields that might contain special characters that Loki’s query language uses (like =, {, }), you may need to be mindful of how those labels are used in queries. However, Loki generally handles label values well.
The next step after extracting labels is often to use them for richer alerting or dashboarding, allowing you to create Grafana alerts or panels that react to specific log levels or components without having to parse the message content itself in Grafana.