Fluent Bit’s tail input plugin can get bogged down if it’s not configured to handle log file rotation and truncation gracefully.
Common Causes and Fixes
-
Fluent Bit losing track of file position after rotation:
- Diagnosis: Check Fluent Bit’s
tailplugin status. If it’s reporting errors like "file not found" or reprocessing old logs, it’s likely lost its position. - Cause: When a log file is rotated (e.g.,
app.logbecomesapp.log.1), thetailplugin, by default, might not recognize the new file as a continuation of the old one. It relies on inode numbers and file offsets. - Fix: Enable
rotate_waitin your Fluent Bit configuration. This tells Fluent Bit to periodically check for rotated files and re-open them.[INPUT] Name tail Path /var/log/app.log Tag app.log Rotate_Wait 360 - Why it works:
Rotate_Wait 360instructs Fluent Bit to wait up to 360 seconds between checks for file changes, allowing it to detect and re-attach to newly rotated files.
- Diagnosis: Check Fluent Bit’s
-
Fluent Bit reprocessing logs after truncation:
- Diagnosis: You see duplicate log entries in your output, or the log volume processed by Fluent Bit suddenly jumps significantly without a corresponding increase in actual log generation.
- Cause: If a log file is truncated (its size is reduced, often back to zero), Fluent Bit might interpret this as the start of a new file and reset its read offset, leading to reprocessing.
- Fix: Set the
truncation_timeoutoption. This tells Fluent Bit to wait for a specified period after detecting truncation before resetting its read position.[INPUT] Name tail Path /var/log/app.log Tag app.log Truncation_Timeout 10 - Why it works:
Truncation_Timeout 10ensures that ifapp.logis truncated, Fluent Bit will wait 10 seconds. If the file grows again within that timeout, it assumes it’s a continuation and resumes from the correct offset. If it remains empty or small, it’s treated as a reset.
-
Fluent Bit indexing the wrong file after a rapid rotation/truncation cycle:
- Diagnosis: Inconsistent log delivery, missing logs, or logs appearing out of order, especially under high log volume.
- Cause: When log rotation and truncation happen very rapidly, Fluent Bit’s internal state might get confused about which file is the "current" one and where to resume reading from.
- Fix: Use the
Parseroption to ensure consistent parsing and theRefresh_Intervalto control how often Fluent Bit checks file metadata.[INPUT] Name tail Path /var/log/app.log Tag app.log Parser docker Refresh_Interval 5 - Why it works:
Parser dockerensures logs are parsed consistently, andRefresh_Interval 5makes Fluent Bit check file metadata (like modification time and size) every 5 seconds, helping it stay synchronized with rapid file changes.
-
Fluent Bit not following log file renames (e.g.,
app.log->app.log.old):- Diagnosis: Fluent Bit stops processing logs for a specific file, and no new logs appear in the output for that source.
- Cause: If the log rotation mechanism renames the current log file instead of moving it and creating a new one, Fluent Bit might lose track if it was strictly monitoring the original inode.
- Fix: Ensure your log rotation configuration renames files and that Fluent Bit is configured with
Read_from_Head On_Error. This option, combined with carefulRotate_WaitandTruncation_Timeout, helps it recover.[INPUT] Name tail Path /var/log/app.log Tag app.log Rotate_Wait 360 Truncation_Timeout 10 Read_From_Head_On_Error true - Why it works:
Read_From_Head_On_Error truetells Fluent Bit that if it encounters an error (like a file being renamed away), it should attempt to re-initialize its read position from the beginning of the current file it’s supposed to be watching, effectively picking up where it left off after the rename.
-
Fluent Bit consuming too much CPU or memory during rotation checks:
- Diagnosis: High CPU/memory usage on the Fluent Bit agent, correlating with frequent log file rotations.
- Cause: Frequent polling and stat calls on many files, especially in directories with thousands of log files, can become a performance bottleneck.
- Fix: Adjust
Refresh_Intervalto a higher value and useIO_Buf_Sizeappropriately.[INPUT] Name tail Path /var/log/app.log Tag app.log Refresh_Interval 30 IO_Buf_Size 16384 - Why it works: Increasing
Refresh_Intervalto 30 seconds reduces the frequency of file system checks.IO_Buf_Size 16384ensures efficient reading, reducing the number of read operations and thus system load.
-
Fluent Bit failing to pick up new log files if the
Pathpattern is too broad:- Diagnosis: Fluent Bit processes logs from unintended files or fails to process logs from newly created files.
- Cause: Using a very general
Pathlike/var/log/*without proper rotation handling can lead to Fluent Bit trying to track too many files, including temporary ones or old rotated logs it shouldn’t be actively tailing. - Fix: Be specific with your
Pathand usePath_Wildcardif necessary, but ensureRotate_WaitandTruncation_Timeoutare set.[INPUT] Name tail Path /var/log/myapp/app.log Tag myapp.log Rotate_Wait 360 Truncation_Timeout 10 - Why it works: A specific path (
/var/log/myapp/app.log) directs Fluent Bit to only monitor that exact file. When rotated, theRotate_WaitandTruncation_Timeouthandle the transition toapp.log.1and the subsequentapp.log.
After fixing these, you might encounter issues with output buffering if your downstream system is slow.