#9 Default to noatime
Closed: Invalid 5 months ago by ngompa. Opened 3 years ago by chrismurphy.

Example:

  1. Make a read-write snapshot of root subvolume.
  2. Check space usage, e.g. btrfs fi us --raw /
  3. grep -r whatsatime /path/to/subvol/snapshot
  4. Check space usage (give it a minute for delayed allocation)

This is not strictly a Btrfs problem. It happens with any COW based setup with snapshots, including LVM thinly provisioned setups, and Stratis. Pretty sure mutt no longer needs it, but if it does, atime should be enabled on demand based on need.

Detailed background Atime and btrfs: a bad combination?


Metadata Update from @ngompa:
- Issue tagged with: Utils

3 years ago

From man 5 btrfs - btrfs does support chattr +A no atime updates

i'd rather see this become the kernel default, but for completeness, there is this obscure method.

We've been doing noatime by default for years for the armhfp images.

Surely we should set this as a mount option rather than try using chattr...?

I'm using: subvol=root,x-systemd.device-timeout=0,compress=zstd:1,discard=async,noatime

And: subvol=home,x-systemd.device-timeout=0,compress=zstd:1,discard=async,noatime

Surely we should set this as a mount option rather than try using chattr...?

:-! Sorry! It's just superfluous information, not advocacy. I'd advocate in the following order:

  1. Upstream kernel. Historically they've resisted such changes, arguing that distros should set optimum kernel defaults. Yet this is a bad default.
  2. Fedora kernel spec file.
  3. Anaconda kickstart --fsoptions which will set it in fstab

(1) and (2) would apply to upgrades, (3) would not. And (1) (2) are more universal (easily applied to all editions, spins, labs, archs - regardless of compose and install method). Whereas (3) will lead to a fragmented approach.

@jforbes thoughts?

This should be set in --fsoptions. The expected place to see such an option referenced is in fstab, and there is no compelling reason for the Fedora kernel to deviate from upstream here.

If we're going to do --fsoptions, we should do it now and get a freeze exception, otherwise people who install F33 beta will permanently wind up with slower btrfs than everybody who installs it later....

It sounds like upstream is just not interested in setting good default mount options (this is a problem for discard=async too) and it seems a lot easier change the defaults (allowing us to make future changes effective during distro upgrades) than to change /etc/fstab (which is probably not safe to edit automatically during upgrades).

If we're going to do that, we probably want Lorax and Anaconda to just do it rather than making people suffer through setting in kickstarts (because --fsoptions isn't available through the GUI).

This issue discussed btw today on Reddit. I thought space_cache=v2 solving this. In my tests, building in mock for example, i didn't notice any slowdowns with noatime and relatime. But maybe this not correct test and case for this topic, just my 2 cents.

My understanding is that the v2 free space cache is more for heavy I/O and extremely large filesystems. I've broken out that particular request into #24.

I find atime (via relatime) to be incredibly useful. I use it to clean up my download and temporary-workspace directories. (I use tmpwatch, but aiui systemd-tmpfiles-clean uses atimes as well.) Is the answer "well, if that's what you want, you should stick with ext4 or xfs"?

From systemd.tmpfiles(5)

The age of a file system entry is determined from its last modification timestamp (mtime), its last access timestamp (atime), and (except for directories) its last status change timestamp (ctime). Any of these three (or two) values will prevent cleanup if it is more recent than the current time minus the age field.

So it's fine with a noatime system.

tmpwatch(8) supports using mtimes with --mtime switch, so it can work with a noatime system, just not by default.

No, it's super-annoying to have it based on mtime. Access time is the value I actually care about in this case. (I know this in practice because I tried to run with noatime back before relatime was invented.) Now, maybe the answer is "eh, mattdm, you are old and no one today cares about your obscure use-case", which is sometimes a fine answer, but if there's a better solution I'd like to have it documented.

Is the answer "well, if that's what you want, you should stick with ext4 or xfs"?

Probably not. For ARM images, we set noatime because atime updates tend to crush SD cards, regardless of filesystem. Personally, I don't think moving from relatime to noatime is particularly useful unless you plan to do an impressive amount of snapshotting.

What may make sense is setting noatime for /usr and /etc with chattr -R +A /usr /etc. Most cases where atimes tend to be problematic are in those two locations anyway.

  • Ordinarily it's a small problem, not a big problem. i.e. it is not a case of use some other file system if you want to keep atime updates.
  • I was thinking of it for Fedora 34 feature proposal, so it gets the proper visibility. It's not mentioned at all in the original proposal. e.g. GNOME Shell uses .trashinfo to track file aging, not atime.
  • I don't think the I/O impact is noticeable, but there is a storage cost if there are snapshots and many files are regularly accessed. Grepping all of /var or /usr or /home may be so contrived it's not worth optimizing for.
  • Read-only snapshots at least won't get additional atime updates, so it's not a completely vicious problem.
  • And as Neil points out, this is a VFS mount option so it's per mountpoint. We could use noatime only on / and leave the default (relatime) on /home if there's some common use cases that need it that I wasn't aware of. Or persuade others that noatime should be the default, and enable relatime for the mounts they're needed on.

Better understanding the problem should be the pre-requiste before going to the trouble of either changing things or documenting them. Hence, I still think that puts us in Fedora 34 feature proposal territory.

  • I don't think the I/O impact is noticeable, but there is a storage cost if there are snapshots and many files are regularly accessed. Grepping all of /var or /usr or /home may be so contrived it's not worth optimizing for.

This is something I do all the time. I would be very surprised if running grep has any storage cost at all. :(

It only has a storage cost if you have snapshots, which we aren't even doing by default right now. If we had modified our layout to add a /var subvolume, we could do noatime for / and leave relatime for /home and /var.

As it stands, we could still implement noatime for /usr and /etc via chattr, and that might be something we do as part of a process for setting up system snapshots, should a user want them.

This is something I do all the time. I would be very surprised if running grep has any storage cost at all. :(

In fact, even booting and launching apps has a storage cost because every binary and library accessed turns into an atime update. atime/relatime turns reads into writes. Snapshots result in retention. The combination means storage cost. Doesn't matter what file system.

I'm wondering if programs needing access events should use inotify_add_watch(2) with IN_ACCESS mask - which can be set per file or per directory.

Please do not turn off atime for anything that needs "file aging" — as in tmpfiles.d/' sixth column.

On my Fedora system there are tmpfiles.d/ snippets configured with "file aging" for at least the following paths:

/var/spool/cups/tmp
/var/lib/systemd/coredump
/tmp/.*-unix
/tmp
/var/tmp
/var/cache/man

(use systemd-tmpfiles --cat-config | grep -v ^# | sort -u to see a list for all tmpfiles.d/ lines on your system, and look for the 6th column)

I think from the list one can deduce that the whole of /var and /tmp better should have atime on.

"File aging" tries to identify files that haven't been "used" since a while. systemd-tmpfiles defines that as the most recent of atime/btime/ctime/mtime to be less than some specified time ago. Of the four times atime is definitely the most relevant one, since it actually indicates that a file was read from.

"File aging" without atime available will kinda work, but of course might mean that the logic sometimes removes files that are still actively accessed for read, simply because we won't notice reads anymore.

I think turning off atime for /usr is OK. But for /var + /tmp we really should leave it on (and maybe even switch to regular atime instead of relatime, since relatime also makes "file aging" unreliable).

tmpwatch(8) supports using mtimes with --mtime switch, so it can work with a noatime system, just not by default.

Fedora doesn't install that anymore, does it? i.e. at least on my system it's not installed anymore. systemd-tmpfiles took that role. Now that tmpfiles.d/ doesn't have to be told --mtime or --atime or so. As mentioned above it implicitly looks at all 4 timestamps and determines the most recent of the four, and uses that, which should mean it makes the best of what it can get. (Note one tweak though: for directory inodes it only uses mtime/ctime/btime and ignores atime, since a simple recursive "find" run through /tmp would mean we'd stop cleaning it up)

@lennart It's not installed by default but I still install it. I've been meaning to migrate to systemd-tmpfiles in my user session config but it hasn't jumped to high priority. It's clear that the same basic concern applies though. (And yes I use tmpwatch --dirmtime)

I think we do use regular atime on /tmp by default since by default it is a tmpfs. So /var (and possibly /home) are the main concern.

I'd really rather not make systemwide decisions based on how btrfs behaves wrt space usage with snapshots. Yes you can end up with more metadata if you do find / and cross into other snapshots, but we already do relatime by default so that should mitigate most of the issues. If it becomes an actual problem we can address it later.

My preference here would be for us to just make things better with relatime as well, so I'm fine with that.

It sounds like upstream is just not interested in setting good default mount options (this is a problem for discard=async too) and it seems a lot easier change the defaults (allowing us to make future changes effective during distro upgrades) than to change /etc/fstab (which is probably not safe to edit automatically during upgrades).

@catanzaro - so @josef can probably correct me here, but my understanding is discard=async and space_cache=v2 are going to be default in a future kernel - so apart from relatime vs noatime which Josef already answered, it's just a matter of time and whether the change are made in time for F33 (for discard=async and space_cache=v2 I believe the answer is no, because the initial assumption was kernel 5.9 will be shipped at GA time?)

I think discard=async is perfectly fine to do right now. space_cache=v2 is being discussed elsewhere, but basically there are a few weird corners that we've discovered in testing that makes me want to wait on that. Not issues with the space cache itself, more around the tooling and consistency with how we handle the mount options.

Both of these things are easily switched at any time, so users who install now can inherit them at some point in the future.

Sorry, I've encouraged us to discuss too many different issues in this topic on noatime. :)

If the plan is to change defaults in future kernel versions, then that's great, so we don't have to have anything special in our /etc/fstab.

I think at this point, we've conclusively decided that noatime is a bad idea for a default.

Metadata Update from @ngompa:
- Issue close_status updated to: Invalid
- Issue status updated to: Closed (was: Open)

5 months ago

Login to comment on this ticket.

Metadata