Fluentd’s multiline parser can stitch together log entries that span multiple lines, a common problem with many application logs.
Let’s watch it in action. Imagine we have a Java application that logs exceptions, where the stack trace can be dozens of lines long. A typical log file might look like this:
2023-10-27 10:00:00 INFO Application started.
2023-10-27 10:01:15 ERROR Uncaught exception in thread "main"
java.lang.NullPointerException: Attempt to invoke virtual method 'void java.lang.String.length()' on a null object reference
at com.example.MyClass.process(MyClass.java:42)
at com.example.MyApp.run(MyApp.java:18)
at java.base/java.lang.Thread.run(Thread.java:833)
2023-10-27 10:02:00 INFO Processing request ID 123.
Without a multiline parser, Fluentd would see each of these as separate events: one for "Application started," another for "Uncaught exception…," a third for "java.lang.NullPointerException…", and so on. The stack trace lines would be treated independently, making it impossible to analyze the full exception context.
The multiline parser in Fluentd solves this by defining a pattern that identifies the start of a new log entry and then accumulating subsequent lines until that pattern is met again.
Here’s a Fluentd configuration snippet that handles this specific Java exception logging scenario:
<source>
@type tail
path /var/log/my_app.log
pos_file /var/log/my_app.log.pos
tag myapp.log
<parse>
@type multiline
format /^(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?<level>\w+) (?<message>.*)/
multiline_mode fuzzy
flush_interval 5s
</parse>
</source>
<match myapp.log>
@type stdout
</match>
Let’s break down what’s happening. The tail input plugin is watching /var/log/my_app.log. The crucial part is the <parse> block with @type multiline.
format /^(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?<level>\w+) (?<message>.*)/: This is a regular expression that Fluentd uses to identify the start of a new log entry. It captures the timestamp, log level, and the rest of the message into named fields. When Fluentd encounters a line that matches this pattern, it considers it a potential new log event.multiline_mode fuzzy: This is key for multiline parsing. Infuzzymode, Fluentd assumes that lines not matching theformatregex are continuations of the previous log entry. If a line does match theformatregex, it signifies the end of the current multiline record and the beginning of a new one.flush_interval 5s: This ensures that even if a log entry doesn’t end cleanly (e.g., the application crashes before writing a matching start line for the last entry), Fluentd will eventually emit the accumulated buffer after 5 seconds of inactivity.
With this configuration, Fluentd will process the example log file like this:
- It reads "2023-10-27 10:00:00 INFO Application started." This line matches the
formatregex. It creates a single event withtime,level, andmessage. - It reads "2023-10-27 10:01:15 ERROR Uncaught exception in thread "main"". This line also matches the
formatregex. It starts a new event. - It reads "java.lang.NullPointerException: Attempt to invoke virtual method 'void java.lang.String.length()' on a null object reference". This line does not match the
formatregex. Becausemultiline_modeisfuzzy, Fluentd appends this line to themessagefield of the previous event. - It continues reading the stack trace lines (" at com.example.MyClass.process…", etc.). None of these match the
formatregex, so they are all appended to themessagefield of the same event. - Finally, it reads "2023-10-27 10:02:00 INFO Processing request ID 123.". This line does match the
formatregex. This signals the end of the multiline exception log event. Fluentd emits the complete exception log as a single event, and then starts a new event for "Processing request ID 123."
The output to stdout would look something like this (simplified for clarity, actual JSON will be more verbose):
{"time": "2023-10-27 10:00:00", "level": "INFO", "message": "Application started."}
{"time": "2023-10-27 10:01:15", "level": "ERROR", "message": "Uncaught exception in thread \"main\"\njava.lang.NullPointerException: Attempt to invoke virtual method 'void java.lang.String.length()' on a null object reference\n at com.example.MyClass.process(MyClass.java:42)\n at com.example.MyApp.run(MyApp.java:18)\n at java.base/java.lang.Thread.run(Thread.java:833)"}
{"time": "2023-10-27 10:02:00", "level": "INFO", "message": "Processing request ID 123."}
Notice how the entire stack trace is now contained within the message field of a single log event.
A common pitfall is using multiline_mode with a regex that is too broad or too narrow. If your format regex incorrectly matches lines that should be continuations, you’ll get fragmented log entries. Conversely, if it’s too strict and doesn’t match legitimate start lines, everything might end up as one giant log entry. The fuzzy mode is generally robust because it relies on identifying the start of a new record.
The multiline parser is incredibly flexible. You can also use multiline_mode line_by_line if you have a very strict format where every line must match a pattern, or multiline_mode greedy which is less common and assumes all subsequent lines are part of the current event until a pattern is met or the file ends. The most powerful aspect is how the captured fields from the format regex become top-level fields in the resulting event, while the accumulated multiline content gets appended to a designated field (by default, message, or you can specify key_name in the parser).
Understanding how the format regex interacts with multiline_mode is the core of mastering this parser, allowing you to correctly reconstruct complex log structures.
The next challenge is often dealing with timestamp parsing within your multiline logs.