Fluentd’s tail input plugin is your go-to for ingesting logs from files, but when those files rotate, things can get messy. The pos_file is the key to picking up where you left off, but if it’s not managed correctly, you’ll either miss logs or re-process them.
Here’s how to ensure your tail input correctly handles log file rotation:
Common Causes of Log Rotation Issues
-
pos_fileNot Updated or Corrupted: If thepos_filethat tracks the last read position within a log file becomes corrupted or is not updated by Fluentd, it can lead to missed logs. This can happen if Fluentd crashes or is shut down abruptly.- Diagnosis: Check the
pos_fileitself. It’s usually a plain text file containinginode-number:offset. If the file is empty, zero bytes, or contains nonsensical data, it’s likely corrupted. - Fix: Manually reset the
pos_file. Delete the file. Fluentd will then start reading the target log file from the beginning.rm /var/log/myapp.log.pos - Why it works: Deleting the
pos_fileforces Fluentd to re-evaluate the current state of the log file, effectively resetting its reading pointer to the start of the file.
- Diagnosis: Check the
-
Incorrect
pos_filePath: Thepos_filepath might be misconfigured in your Fluentd configuration, pointing to a location where Fluentd doesn’t have write permissions or to a non-existent directory.- Diagnosis: Verify the
pos_filedirective in your Fluentd configuration. Check the directory specified for write permissions for the user running Fluentd. - Fix: Ensure the
pos_filepath is correct and the Fluentd process has write permissions to the directory.<source> @type tail path /var/log/myapp.log pos_file /var/log/td-agent/myapp.log.pos # Ensure this path is writable tag myapp.log </source> - Why it works: A valid and writable
pos_filepath allows Fluentd to store and retrieve its read position reliably.
- Diagnosis: Verify the
-
Log Rotation Utility Not Using Standard
rename: Some log rotation utilities don’t use the standardrenamesystem call, which is crucial fortailto detect file rotation. If the log is truncated or overwritten instead of being renamed, Fluentd might not realize a new file has started.- Diagnosis: Examine your log rotation configuration (e.g.,
/etc/logrotate.d/myapp). Look for directives likecopytruncate. Ifcopytruncateis used, it’s a likely culprit. - Fix: Configure logrotate to use
renameorcreatewithrotateanddelaycompress. The default behavior oflogrotateis usually sufficient ifcopytruncateis avoided.# Example logrotate config snippet to AVOID copytruncate /var/log/myapp.log { rotate 5 daily missingok notifempty compress delaycompress # DO NOT use copytruncate here if possible } - Why it works: The
renamesystem call ensures that the original file inode is preserved, allowing Fluentd to continue tracking it while a new file is created.copytruncatetruncates the original file after copying it, meaning Fluentd might miss logs written between the copy and the truncate.
- Diagnosis: Examine your log rotation configuration (e.g.,
-
inodeChanges During Rotation: When a log file rotates, the new file typically gets a new inode. If Fluentd is configured to only rely on the offset within thepos_fileand not the inode, it might incorrectly resume reading from the old, now-rotated file.- Diagnosis: Fluentd’s
tailinput plugin by default tracks both inode and offset. If you’ve explicitly disabled inode tracking (which is rare and not recommended), this could be an issue. - Fix: Ensure inode tracking is enabled (it is by default). The
pos_fileformatinode:offsetis designed for this. No configuration change is usually needed.<source> @type tail path /var/log/myapp.log pos_file /var/log/td-agent/myapp.log.pos tag myapp.log # inode_sensitivity true # This is the default and recommended </source> - Why it works: By tracking the inode, Fluentd can detect when the file it was reading has been replaced by a new file with a different inode, even if the filename is the same. It then knows to look for new data in the new file.
- Diagnosis: Fluentd’s
-
Frequent Log Rotation: If logs are rotated very frequently (e.g., every minute), Fluentd might not have enough time to write its
pos_fileupdate before the rotation occurs, especially under heavy load or if the disk is slow.- Diagnosis: Monitor Fluentd’s
pos_fileupdate frequency and the log rotation schedule. Check Fluentd’s logs for any errors related to writing to thepos_file. - Fix: Increase the
flush_intervalin your Fluentd configuration for thetailinput. This gives Fluentd more time between writing its internal state to disk.<source> @type tail path /var/log/myapp.log pos_file /var/log/td-agent/myapp.log.pos tag myapp.log flush_interval 5s # Increase from default 1s if needed </source> - Why it works: A larger
flush_intervalmeans Fluentd waits longer before flushing its internal state (including thepos_fileupdates) to disk. This reduces the chance of a rotation happening between a log read and its correspondingpos_fileupdate.
- Diagnosis: Monitor Fluentd’s
-
Multiple Fluentd Instances or Processes: If multiple Fluentd processes are trying to tail the same log file using the same
pos_file, it can lead to race conditions and corruptedpos_fileentries.- Diagnosis: Ensure only one Fluentd process is configured to tail a specific log file with a specific
pos_file. Check your Fluentd deployment and configuration. - Fix: Dedicate each log file (and its
pos_file) to a single Fluentd input configuration or process. - Why it works: This prevents concurrent writes and reads to the
pos_filefrom different processes, eliminating the possibility of conflicting updates and ensuring data integrity.
- Diagnosis: Ensure only one Fluentd process is configured to tail a specific log file with a specific
The Next Hurdle
Once you’ve nailed log rotation, you’ll likely encounter issues with timestamp parsing if your application logs don’t follow a consistent, easily recognizable format.