It would be great if it were possible to avoid the cron job (which sucks a lot of I/O time) by adding a daemon that would use Linux's fanotify and or inotify APIs to watch for filesystem level events and update the database accordingly. The cron job would still be needed for old versions of Linux and for kernels other than Linux.
In addition, mlocate could gain some options to search based on times when the file being searched for existed.
There is an implementation of a system-wide filesystem monitor using fanotify:
fanotify has been around since 2009:
Thanks for your report.
A fanotify watcher would mean additional I/O for every file system modification, and a more complex database format that can handle on-line updates. If there are several related modifications (e.g. creating a temporary file name + rename + removing an old file), fanotify watcher will have to perform I/O for every one of them, while a daily cron only runs once a day. It's not obvious that it using a watcher would lead to net savings; it would definitely distribute the load throughout the day, which is also not obviously an improvement (depending on use cases).
Relying on fanotify exclusively is also somewhat problematic; for example, it can't catch changes that happen during system startup or shutdown when the watcher isn't running.
Given the limited time I have for work on mlocate, I don't currently expect fanotify support to happen.
(It seems the ideal solution would be to have kernel support for persistently storing a list of file system changes, similar to a [http://msdn.microsoft.com/en-us/library/windows/desktop/aa363798%28v=vs.85%29.aspx NTFS Change Journal] or [https://developer.apple.com/library/mac/#documentation/Darwin/Conceptual/FSEvents_ProgGuide/TechnologyOverview/TechnologyOverview.html OS X File System Events], and use this to guide the daily (or more frequent, perhaps even much more frequent?) database update.)
With [http://open-zfs.org/wiki/Main_Page ZFS], this could be done by creating a snapshot every time the mlocate cronjob runs (snapshots in ZFS are very, very inexpensive), and comparing the current snapshot with the previous one using "zfs diff".
The diff runs very fast and its output is very easy to process: it shows each file prefixed by a "+" (in case that file has been added since the previous snapshot), a "-" (in case it has been deleted) or "M" (in case it has been modified -- which, for mlocate's ends, could be just ignored). After processing, the previous snapshot could be erased (only the last snapshot would have to be kept for the next run).
In fact, perhaps this processing could even be integrated into the [https://github.com/zfsonlinux/zfs-auto-snapshot zfs-auto-snapshot] script (which can be installed to automatically take ZFS snapshots every 15 minutes, and save the latest four plus the same number of hourly, daily, weekly and monthly snapshots)... hummrmrm... will have a talk regarding that with the zfs-auto-snapshot author.
The database doesn't need to be updated every time a file is created/removed, it could batch update changes every 5 minutes.
The main issue with the current cron-job based setup is that it must walk the whole filesystem to update the database. This generates a lot of I/O and will alter the kernel's filesystem cache, bringing a lot of metadata into the cache that normally wouldn't be in use, possibly evicting in-use metadata. Even on my laptop updating every 5 minutes with changes would be a huge win over a cron job.
It is true that fanotify/inotify would not be perfect but you could make it a 90% solution which users could choose to activate.
Your proposal of kernel changes sounds perfect. I'd like to see fanotify/inotify support too since that is available in a wider range of kernels.
I think you should re-open this and instead of saying wontfix, say "I won't work on this but patches welcome" :)
The trouble is that no amount of patching ''mlocate'' can make this work. (Of course, if I’m wrong, it’s perfectly possible to both attach a patch to a closed ticket and to create a new ticket with patches.)
Note that a “90% solution” meaning that it tends to miss 10% of results in most searches would be useful to precisely nobody.
The point of locate is to build an index that is 1) reliable, and 2) cheap enough, and 3) still useful; given these constraints and an inability to use whole-system watches, the locate index of ''file names and not contents'' makes sense.
However, if the kernel ever got the NTFS-/OS X-like change monitoring, it would be more natural and reasonable to build a ''full-text'' index on top (like both Windows and OS X already do): once the contents of the file are in memory (because the application modifying the file has needed them there), the full-text index scan and update is more or less free. And with a reliable, full-system full-text index, mlocate is completely redundant.
The Linux kernel now has a more fully-featured fanotify interface (it sends directory events like renames too). So it should be possible to implement this reliably now.
Metadata Update from @pabs:
- Issue close_status updated to: None (was: Invalid)
Metadata Update from @pabs:
- Issue status updated to: Open (was: Closed)
to comment on this ticket.