#54 Default disk partitioning layout for Workstation
Closed: Can't fix 2 years ago by chrismurphy. Opened 4 years ago by eischmann.

There is an open bug about revisiting the default disk partitioning layout for Fedora Workstation. I think Workstation Working Group definitely has a say in this, so I'm opening a ticket here:

Description of problem:

Now that we have a lot of usage of Docker and Flathub, our default disk partitioning layout is problematic - having a separate home and a small root means running out of disk much more quickly. This has hit me on every single one of my machines. :(

Things are a lot different since we made the decision to have the current default disk layout - we have much more reliable upgrades in addition to a lot of new technology that eats up root disk space.

Can we revisit our current default disk layout and consider not having a separate /home?

The original bug: https://bugzilla.redhat.com/show_bug.cgi?id=1577971


Certainly for Silverblue, there will be the ability to try out different versions of Fedora within a single partition, hopefully also an even more robust upgrade story, and even more usage of containers and Flatpaks, so even more reason to use a single partition there.

I'd support migrating to a single partition for all Fedora Workstation. (Personally that's what I've done for my last few Fedora installs, after some hairy experiences resizing / vs. /home)

Ideally, if we are setting requirements for disk partitioning, this would be reflected in the installer (either when creating partitions manually, or using pre-existing partitions). For example, it should warn if / isn't large enough.

Metadata Update from @catanzaro:
- Issue tagged with: meeting

3 years ago

This needs to be considered in tandem with LVM. LVM is not supported by Workstation tools (e.g. GNOME Disks) and probably needs to go away as part of this change. The "feature" that the LVM volumes get created with names based on the hostname assigned by your ISP at install time is also unfortunate.

Is the only sane alternative here simply /boot, swap, and / (root)?

Does this have any implications for dual boot? It's important for a lot of new users, and we ought to make sure that it's as easy as possible.

Unfortunately it seems well-established at this point that dual boot will never work reliably, for technical reasons, regardless of what we desire. It will work fine for some new users, and for others it will be broken in mysterious ways, and that's probably the best we'll ever be able to do. Chris Murphy is our expert on this and would no doubt be happy to explain all the issues in more detail, if desired.

Is the only sane alternative here simply /boot, swap, and / (root)?

We should probably ditch the swap partition entirely, in favor of a swap file, like Ubuntu has already done.

So a few outcomes from the WG meeting today:

  1. Eliminating separate /home seems to have more plusses than minuses. Because disks are so large now, the default 5% reservation of space should well protect most users. 250G disks are the norm nowadays, which means > 12G of reserved space. That seems like it would cover even aggressive background package downloading.

  2. However, @misc noted that not using LVM could affect the usefulness of thin provisioning for things like docker. Most meeting attendees didn't know too much about this and we'd need to seek some advice on this. (Or @misc could explain more here.) One possible solution is to see whether the installer should offer to reserve space for an LVM PV for "developer stuff." This could be divvied up by the developer user and mounted to /var/lib/docker, /var/lib/libvirt, /var/lib/mysql, etc.

The only clear action we arrived at in the meeting was for me to check with the Anaconda team if this second idea was practical.

From IRC today:

10:12 < misc> zbyszek: stupid question, would systemd work well with a separate /var/lib partition (I got the intuition that it wouldn't, but maybe I am wrong)
10:54 < zbyszek> misc: separate /var or separate /var/lib is OK
10:55 < zbyszek> Should just work without trouble, but it might be necessary to add RequiresMountsFor= or otherwise make sure that various units are not started before the partition is mounted

Dual boot will not be negatively impacted. The Windows bootloader is oblivious to our layout. And if anything breaks the broad user case, there's a release criterion that'd make it a blocking bug. Actually, fwupd helps here: bugs are fixed with firmware updates, but also getting Fedora users with a given make/model all on the same firmware revision helps too.

Default partitioning. Simplest solution is boost root volume max from 50G to e.g. 70G, changing nothing else. That'll pilfer it from the home volume at installation time.

Alternatives get more complicated:

  • Boot * is still needed for encrypted root; alternative is GRUB and Anaconda work to use GRUB's built in LUKS support and make it look somewhat pretty at boot time (now that we're hiding GRUB).

  • Swapfiles * can't directly be supported on COW storage, which includes Btrfs, XFS (reflinks), anything on dm-thin, including Stratis. Supporting file based swap on such storage is possible with a loop or nbd device. However, I'm not super clear on the advantage over a dedicated swap partition. 8G swapfile vs 8G swap partition (or LV) is still gonna require an 8G reservation either way.

  • Unified root and home volumes * i.e. big root volume with /home as a directory. If you drop LVM, I think you must combine root and home, because without LVM it's non-trivial, risky, and takes a long time (possibly hours) to shrink home and grow root. If you combine root and home, you could keep LVM but it's sorta pointless. Anyone who wants it can do a custom install.

For reference:
Fedora-Workstation-netinst-x86_64-29_Beta: root volume uses ~5.1G
Fedora-Silverblue-ostree-x86_64-29: root volume uses ~3.8G (one tree, two trees might approach 5.1G depending on the diff between them)

50G is a decent chunk of space, not sure where it's all going.

Discussed and voted on at the Workstation WG meeting today:

#agreed For F30, change the default partitioning for workstation to no LVM, unified / (separate /boot, swap) (+5; 0; 0)

https://meetbot.fedoraproject.org/fedora-meeting-2/2018-11-05/workstation.2018-11-05-14.01.log.html

One thing that was noted that LVM isn't used either for our default docker storage drive (overlay2 unless I'm mistaken) or for storaged storing containers for podman.

Although this change is workstation-specific, it probably needs to go through the change process since it's a fairly big change.

Metadata Update from @catanzaro:
- Issue untagged with: meeting

3 years ago

In light of the related discussion on LUKS, which might require separate /home (but probably not LVM), we've agreed to defer these changes:

  • ACTION: otaylor form a subgroup to look at LUKS issue -- WG is not
    taking any specific actions until subgroup recommends something
  • LVM and /home partioning issues will depend on subgroup report
    recommendation for LUKS

FWIW at least for Silverblue I'd like to strongly push for /var as a separate partition. It's a much stronger and IMO better version than just /home, since it also includes flatpak, libvirt images, etc.

The main problem with /var as partition on classic (non-ostree) systems is /var/lib/rpm, but rpm-ostree sticks that in /usr/share/rpm which at some point we should probably do for classic too; see also http://lists.rpm.org/pipermail/rpm-maint/2017-October/006681.html

Are you pushing for it to be separate preserve state across a reinstall, or
it as a separate place with data that can be encrypted without encrypting
the system partition? The overriding concern I have is that the more we
split things up, the harder it is to get the split right. (thinp is a big
hammer, but needs a lot of work to make it non confusing)

FWIW at least for Silverblue I'd like to strongly push for /var as a separate partition. It's a much stronger and IMO better version than just /home, since it also includes flatpak, libvirt images, etc.

I don't understand what problem this solves. It's just rearranging the deck chairs, so that now whatever space problems people have with sysroot, they'll have with a var volume. Also, good luck convincing Anaconda team, historically they consider extra partitions confusing to users. I kinda have to agree.

Also based on the entire RPM list thread, I figured rpm-ostree would stick rpmdb in /usr/lib/sysimage/rpm since that's what RPM folks suggested and SUSE folks were going to implement.

Are you pushing for it to be separate preserve state across a reinstall, or
it as a separate place with data that can be encrypted without encrypting
the system partition?

I see it as a continuation of (and fundamental improvement on) the Anaconda default "split /home" model (though note that's just Workstation/custom installs - both AH and Server use a small / LV by default).

The rationale for that originally I believe was indeed "preserve state across reinstall".

The overriding concern I have is that the more we
split things up, the harder it is to get the split right. (thinp is a big
hammer, but needs a lot of work to make it non confusing)

There is also Stratis of course which is intending to make this approach better; I believe it does automatic resizing for example. I haven't played with it much though myself.

I keep running out of space on my root partition due to /var/lib/flatpak. :( Seems time to change our defaults, but this is blocked on #82.

How about just changing the default 50GiB for vg/root to 60GiB as a stop gap? Changing the default size is obviously safe, so I think it's reasonable to do during freeze.

Usually this would be in product.img but there is no product.img on Fedora 30 Live. And this commit seems to confirm it but I'm not able to track it down.

Metadata Update from @petersen:
- Issue tagged with: meeting

3 years ago

In the meeting on April 8, we accepted a suggestion to go with 70GB.

Metadata Update from @catanzaro:
- Issue untagged with: meeting

3 years ago

Note that is actually 70 GiB (which is 75 GB).

UPDATE: This problem is substantially reduced by the Docker overlay2 driver which Podman is using, but calls it overlay. It's not possible to use the original, deprecated, Docker overlay driver in Podman, near as I can tell.


@otaylor your podman comment in #104 made me think of an old problem...

Even moderate use of containers with overlayfs dramatically increases inode consumption. Inode exhaustion is somewhat common with the default ext4 inode ratio, 1 per 16KiB, where typical inode consumption in container centric workflows is 1 per ~6KiB. On ext4 the ratio can only be set at mkfs time. This is one of the reasons many container centric projects have moved to XFS and Btrfs which have dynamic inode allocation and reflinks. And podman by default does use overlayfs.

CoreOS made this change almost four years ago when they were using ext4. And then Fedora CoreOS is now using XFS. I think it's a big enough hassle that we should just change the inode ratio for Fedora 31, because the work around is to tell people to reinstall, and either pick another file sytsem or modify /etc/mke2fs.conf before reinstalling to alter the default inode ratio so that Anaconda will pick it up at mkfs time.

mkfs.ext4 -i 4096 or mkfs.ext4 -T news will do it. And as the block size is going to be 4KiB, there's no point in having more inodes created than files.

Fix
https://github.com/coreos/scripts/pull/379
Problem
https://github.com/coreos/bugs/issues/264
https://github.com/kubernetes/minikube/issues/1443

Even moderate use of containers with overlayfs dramatically increases inode consumption

This is where I reiterate that OverlayFS is trash (everyone knows it, even though not everyone wants to acknowledge it...), and I'm in favor of changing the defaults for ext4 and xfs for this case.

This will be especially important for the Silverblue variant, where containers actually matter.

In general, I find it hard to be terribly concerned about this problem for the regular Workstation use case, especially as container use-cases are very limited and Podman does not use OverlayFS in rootless mode by default.

Also based on the entire RPM list thread, I figured rpm-ostree would stick rpmdb in /usr/lib/sysimage/rpm since that's what RPM folks suggested and SUSE folks were going to implement.

Unfortunately, the RPM-OSTree effort to switch over from /usr/share/rpm to /usr/lib/sysimage/rpm has more or less completely stalled out. In the time since it was implemented in openSUSE (end of 2017), there has been essentially zero movement by RPM-OSTree to rationalize. This has severely complicated things for both Fedora and openSUSE, as a number of folks were counting on RPM-OSTree moving quickly to fix some quirks in how libsolv deals varying rpmdb paths.

I suspect we're only going to finally get that change once I get around to proposing we move the rpmdb for classic Fedora in specific circumstances. At that point, we'd be supporting three configurations for the rpmdb path in Fedora, which is way more insane than the two we have now.

I see it as a continuation of (and fundamental improvement on) the Anaconda default "split /home" model (though note that's just Workstation/custom installs - both AH and Server use a small / LV by default).

The problem is that system state is rarely desired to be preserved across reinstalls. Indeed, it's often completely unwanted. One of the reasons why /home is no longer part of /usr and was never moved into /var is because it was never considered part of "system state", but instead a separate "user data" location. The fact that RPM-OSTree requires /var/home is a detail that probably should be fixed someday.

The only reason people seem to care about preservation of /var is because for some reason, we're installing Flatpaks there by default when we do not need to for the user context install (i.e. through Software in a user session).

System-wide Flatpaks are not helpful if you're trying to provide "safe" software installation. If one of the major value propositions is that applications can be installed and managed without privileges, why is this not enabled in the default workflow?

In doing so, we manage to maintain that the system side storage remains relatively small, while the user data portion grows as intended. This also permits our existing storage configurations to scale properly on most systems where people use Flatpaks.

System-wide Flatpaks are not helpful if you're trying to provide "safe" software installation. If one of the major value propositions is that applications can be installed and managed without privileges, why is this not enabled in the default workflow?

I agree this default doesn't make much sense for Workstation.

We can change it whenever we want.

(Sorry, my updated comment on overlay/overlay2 probably didn't cause a notification to get issued.)

Podman does not use OverlayFS in rootless mode by default.

It's using fuse-overlayfs by default, root or rootless, regardless of the file system.

The only reason people seem to care about preservation of /var is because for some reason

As far as I'm aware, it's the only location they can be shared from. If they're user installed, then they aren't sharable. Off hand, I'm skeptical of asking the user at setup time, i.e. g-i-s, if they intend the system for single user or multi user, and use that indication to set --user or --system as the default accordingly. Gotcha is, what if they change their mind, and a bunch of apps are now simply not viewable to newly added users? Or even different personas as logins, the work login vs, personal login?

System-wide Flatpaks are not helpful if you're trying to provide "safe" software installation.

Whether located on /var or /home, if the system supports cheap snapshots, they can provide true isolation, as another layer of safety. Btrfs has cheap snapshots, Facebook makes use for them for exactly this purpose in their container workflow. While not as cheap, XFS can approximate snapshots with cp -r --reflink dir1/ dir2/ and Fedora 31 ships with xfsprogs enabling reflink support by default at mkfs time.

In doing so, we manage to maintain that the system side storage remains relatively small, while the user data portion grows as intended. This also permits our existing storage configurations to scale properly on most systems where people use Flatpaks.

I've long been a fan of dropping separate partitions, and using quotas if limits need to be set to specific areas.

I'm curious what working group members' partitioning looks like. The following will forward minimally revealing information to fpaste, and return a URL that you can post.

df -h --output='source','fstype','avail','target' | grep -v tmpfs | fpaste

Example output from my system:
https://paste.centos.org/view/3b0b2d09

Unrelated complaint: I haven't been able to use fpaste since it switched from paste.fedoraproject.org to paste.centos.org:

$ df -hl --output='source','fstype','avail','target' | grep -v tmpfs | fpaste
Uploading (0.3KiB)...
You are not allowed to paste

Seems to require be a bit strict about antispam. Anyway, pasting here is probably easier:

Filesystem              Type     Avail Mounted on
/dev/mapper/fedora-root ext4       22G /
/dev/nvme0n1p2          ext4      672M /boot
/dev/mapper/fedora-home ext4      538G /home
/dev/nvme0n1p1          vfat      181M /boot/efi

I made one change from the default: I manually deleted my swap partition. The swap partition created by anaconda was ridiculously large, since it tries to match RAM 1:1 and I have a workstation-class machine.

Ahh it's a good point about swap, the proposed command ignores this entirely. In my case it's swap on ZRAM at 1:1.

FWIW, in Fedora 32 the EFI System partition is being bumped to 600M, to account for staging large fwupd updates over the long haul as well as multiple bootloaders.

This comment from Colin Walters caught my attention:
(I'm also working to rebase Fedora Silverblue on Fedora CoreOS' toolchain and I'd like the same to happen for IoT, which would mean this proposal should more clearly call out it's affecting Fedora images that use Anaconda)

Silverblue would be using coreos-assembler as its installer? What's the user experience going to look like? What's the partitioning layout? Does this impact #101? Is coordination indicated? Or are these separate things for the foreseeable future?

Silverblue would be using coreos-assembler as its installer? What's the user experience going to look like?

No, it'd use https://github.com/coreos/coreos-installer/ as the installer.

What's the partitioning layout?

https://github.com/coreos/fedora-coreos-tracker/issues/18
and
https://github.com/coreos/fedora-coreos-tracker/issues/94

Basically, the "installer" just does a "dd to disk". On boot (Ignition) we will support reconfiguring the rootfs storage.

The big value of this for CoreOS is that it works fully symmetrically in cloud and metal environments. Anaconda isn't really relevant for clouds like GCP, AWS, OpenStack etc. (One can make it work there, it's just not at all designed for it)

One consequence of a single ext4 is clean installs (reprovisioning) will require the user backing up /home and restoring it, because Anaconda enforces reformatting the system root volume. Does this concern anyone? Can you think of any LVM features you'd miss?

Metadata Update from @chrismurphy:
- Issue tagged with: installation

2 years ago

Metadata Update from @chrismurphy:
- Issue tagged with: meeting

2 years ago

Stratis design doc; this stands out as particularly relevant: Numerous problem reports throughout the years indicate that resizing filesystems is an area where users feel unease, due to potential data loss if a mistake is made. No real reason to require the user do this any more.

Even if there were graphical tools for fs resize on Workstation, the usual scenario is /home is too big. Since /home is ext4, which can't shrink while online, it means either the user needs to boot from Live media and use e.g. GParted blivet-gui or cockpit to do the resize; or the login window needs to grow the ability to do fs resizing. (The last part is 50% comedy and 50% serious.)

Therefore I think it's uncontroversial to drop LVM (thick and thin). And just obviate the resize question/problem.

Therefore I think it's uncontroversial to drop LVM (thick and thin). And just obviate the resize question/problem.

Well the entire WG was (previously) in agreement to drop LVM. We have only delayed because we want to pair this change with whatever encryption change we come up with, so that we change our defaults only once instead of twice.

Metadata Update from @chrismurphy:
- Issue untagged with: meeting

2 years ago

Re: possible mitigation to avoid changing partitioning. One idea is to install flatpaks in ~/.var rather than /var which is the current default.

But what about the multiuser arrangement with sd-homed? Each user has their own private unshared space allocation. [1] In effect, it's adding even more partitions.

It's true that today /home typically has more free space than /, but it might be a flawed assumption that each ~/ will have more free space than either / or any other user's ~/. Defaulting app installation to ~/.var, there will be some propensity for app duplication wasting space. And some users will install more apps than others. I'm uncertain whether it should eat up that user's ~/ allocation - rather than a shared space. There are pros and cons either way.

Is something like e.g. /home/shared/.var a possibility? As in all users share this space for apps?

Not that sd-homed does have a resize subcommand to resize the entire stack (user file system, dm-crypt block device, the loop device, and the backing file). But there is no management. As in there's no live monitoring or policy. It could, maybe even should, grow this capability over time so that DE's don't have to reinvent this wheel. I have a few ideas about that, and also how a hypothetical /home/shared/.var could be treated as a sort of virtual user tending to accumulate most free space for itself; ensuring enough slack space is available for growing an active ~/ on-demand.

[1] With homectl create --storage=luks which is a loop mounted file. Another option, fscrypt+ext4, shares space among users. It's not as secure, nor is it portable. But it should be considered. A neat thing about supporting sd-homed, is the flexibility to change the (Fedora Workstation) default while still having long term backward support for the prior default.

Well the entire WG was (previously) in agreement to drop LVM. We have only delayed because we want to pair this change with whatever encryption change we come up with, so that we change our defaults only once instead of twice.

We were? I don't remember agreeing to that.

I'm curious what working group members' partitioning looks like.

My workstation at work (which is currently powered off due to power outage...) uses a variation of the default setup with additional LVs for /var/lib/machines and /var/lib/libvirt/images.

My current work-from-home VM has this setup:

Filesystem     Type     Avail Mounted on
/dev/sda3      btrfs      74G /
/dev/sda3      btrfs      74G /home
/dev/sda1      ext4      647M /boot

A lot of my personal Linux setups are either LVM or Btrfs setups, depending on when they were set up and what they were set up for.

We were? I don't remember agreeing to that.

It was before you joined the WG. Actually, it was in this very issue, up above: https://pagure.io/fedora-workstation/issue/54#comment-539237

We never implemented this decision because we wanted to wait and see whether it would have implications for the encryption issue. I noticed in the last meeting that you seemed to like LVM, which surprised me. LVM has not historically had much support in the WG, given that our current tooling (GNOME Disks) doesn't support it well. I'm interested to know your thoughts on it.

Assume systemd-homed with luks-file-on-loop: there's a bit of a gotcha if we were to go with one big file system: whether and how to let the user optionally encrypt the system. Does it involve double encryption of ~/ ? I'm skeptical of this mainly for disaster recovery reasons, but also performance. It complicates things. And might imply some kind of partitioning to avoid double encrypting user homes.

Assume a short term context instead: a strong case can be made for one big file system, and use the existing "encrypt my data" checkbox. And possibly enable it by default, if there's consensus to swallow the bitter pills we've discussed so far.

I'm happy to continue to engage on LVM conversation, I think it's badass in many ways, but I've set it aside entirely in favor of Btrfs. LVM is the realm of expert storage admins. One Anaconda developer referred to it as emacs for storage. The Stratis design doc puts it in legacy monolithic terms, with strong backward compatibility demands, and was not even considered for simplification, hence Stratis.

If GNOME Disks hypothetically supports LVM better in the future, is it an acceptable workflow to require the user reboot to install media just to move some free space from fedora/home to fedora/root and then reboot again? I'm persuaded by the Stratis argument No real reason to require the user do this any more.

We never implemented this decision because we wanted to wait and see whether it would have implications for the encryption issue. I noticed in the last meeting that you seemed to like LVM, which surprised me. LVM has not historically had much support in the WG, given that our current tooling (GNOME Disks) doesn't support it well. I'm interested to know your thoughts on it.

In general, I do not consider the lack of support by GUI tools to be a blocking reason for using a particular technology. However, I do consider it as an impediment to fully leveraging it.

The reason I like LVM is that LVM makes it easy to grow the my storage organically without having to reformat/reinstall. It's not quite as simple or flexible as Btrfs, but not having to reformat or reinstall means that I don't have to spend time offloading data somewhere, getting new disks, and then moving it back.

That said, I primarily use LVM on systems today where I cannot use Btrfs. My preference remains with Btrfs, and I'd personally like to see if we could switch our default to that. My reasons for this are as follows:

  • Online safe resize (shrink and grow) of the filesystem
  • Online safe growing and shrinking of the storage (add/remove disks, RAID reconfigure, etc.)
  • Trivial automatic OS snapshotting to enable better recovery scenarios
  • Interest and support from a prominent downstream user (FB infra and engineering)
  • Upcoming support for integrity features in the filesystem
  • Solid performance for container-based workload scenarios (esp. with cgroupv2)
  • Safe conversion from ext4 is supported

Other contemporary distributions (openSUSE and Ubuntu) are making a great point of ensuring they have a solid story there by leveraging Btrfs and ZFS, and my personal experience over the past five years using Btrfs has indicated we should be doing this too.

At Facebook, on our Fedora devices we switched away from the default layout (50GB root) since it became a constraint; we're leaning towards using Btrfs, but are currently just using ext4, with /boot, swap and leaving everything else in /.

With btrfs we'd be able to have subvolumes for different kinds of data (home directories and /, to begin with, but also put VM images, database etc. in different subvolumes) and snapshot them separately to easily undo mistakes.

We need to do some work to validate this - it's been working fine for people who install manually, but automating a LUKS + Btrfs setup is not obvious via kickstart, and we might need to do some %pre magic.

Do you think it's at all problematic or controversial if Fedora Workstation edition were to drop LVM by default?

Hi, I suppose that you want to alter only the default partitioning (and not the support for LVM in general). From the Anaconda perspective, the change will be quite simple. We have recently extended the Anaconda configuration file with an option for the default partitioning scheme. See: https://github.com/rhinstaller/anaconda/commit/4b045b9

Therefore, we just need to add default_scheme = PLAIN to the product file for Fedora Workstation. See: https://github.com/rhinstaller/anaconda/blob/master/data/product.d/fedora-workstation.conf

One consequence of a single ext4 is clean installs (reprovisioning) will require the user backing up /home and restoring it, because Anaconda enforces reformatting the system root volume.

This is definitely a drawback to consider. Anaconda probably won't be able to drop this enforcement.

I'm writing here not as Anaconda developer but my personal opinion on the issue. From Anaconda PoV what @vponcova is correct and it should be pretty easy to switch the default partitioning.

I don't think that dropping of /home is really a good idea. For me, it gives you possibility to safe your data when your system goes crazy and eats itself. Another high valuable point for me is that when I'll go out of space (reasons, steam, music, photos), it won't kill my system because all the core services and stuff is writing and reading from / and not from my full /home.

I'm also not convinced that replacing LVM by standard partitioning is a good idea. Especially if we are taking into account desktop computers witch could have plenty of HDDs. With having just plain partitions you can't really use these disk easily for games or flatpaks. Also it feels to me like de-evolving of storage possibilities. I think this set up describes a big portion of Workstation user base.

I totally agree with what @ngompa wrote here. Using LVM/BTRFS will give you plenty of possibilities which you will lose otherwise. Not sure if we want to switch to BTRFS or not and I don't want to decide this but anything of these two is better than plain partitions solution.

Another high valuable point for me is that when I'll go out of space (reasons, steam, music, photos), it won't kill my system because all the core services and stuff is writing and reading from / and not from my full /home.

Problem is we kept running out of space under /var when we used 50 GB for the root partition. So we increased that to 70 GB. Now if you install with a 100 GB hard drive, you have 30 GB home and most of the 70 GB wasted. LVM doesn't solve this; we have to handle the situation automatically, without any user interaction.

Really, the user experience when running out of space is the root of the problem here. We can't automatically repartition and recover, LVM or not. (Maybe Stratis can do that?) So our best bet is to make running out of space in either place as unlikely as possible by merging them together, and then pray we don't run out. Admittedly, not a great result, but I think it's better than what we have now.

It would be wonderful if we could make Fedora more robust to running out of disk space, in the same way we're currently working to make it more robust to running out of memory.

I see your point @catanzaro and I agree with you. On the other hand, I think there is no reason to assume that having full root because of containers and flatpaks happens more often than having full home. Or is there a reason to support that statement?

What I want to state out is that merging these two together could make the situation worse. Users who have problem with filling up their home could handle that "easily" right now, thanks to the separated /home. However, this could be painful if the /home will be on the same partition as root.

I see your point @catanzaro and I agree with you. On the other hand, I think there is no reason to assume that having full root because of containers and flatpaks happens more often than having full home. Or is there a reason to support that statement?

Anecdotally, I've run out of space under / many times, but never come anywhere close to running out under /home. Since / is fixed-size and all remaining space is allocated to /home, the likelihood of running out of space in one or the other is really dependent on how big your disk is. I imagine plenty of users are in the exact opposite situation, where running out of space under /home is more likely.

Attempt to summarize this issue:

Problem is that our previous default 50 GB root partition was too small; users kept running out of space. It happened to me several times. We increased the default root partition to 70 GB as a workaround, but now that's a lot of space used by / when our default install doesn't require anywhere near that much. This is frustrating when e.g. using a 100 GB disk, you wind up with maybe 25 GB /home and 60 GB empty space under /. The Working Group has approved having just one / partition without separate /home in order to reduce the likelihood of running out of space in either place, and removing LUKS to simplify things. But we are revisiting this decision. In particular, running out of space in / tends to be non-recoverable for nontechnical users, and having /home on the root partition is going to make that case more likely. This seems like a hard problem to solve.

We also have strong interest in using btrfs by default (it seems to work well enough nowadays). I've seen Stratis mentioned as well.

Is the only sane alternative here simply /boot, swap, and / (root)?

Yes, i have changed the default partitioning to this for some time now, because experience proved separating / and /home only lead to issues (one of the two ends up out of space and inefficiency, free space was wasted while the other partition was full.

Is the only sane alternative here simply /boot, swap, and / (root)?

Yes, i have changed the default partitioning to this for some time now, because experience proved separating / and /home only lead to issues (one of the two ends up out of space and inefficiency, free space was wasted while the other partition was full.

One of the reasons why I switched to Btrfs five years ago was to deal with this problem. Ultimately, it's still a single pool of space and how you use it is completely up to you, making it so much easier to efficiently use it.

@catanzaro I ran out of space multiple time both in /home and / alternatively and keeping the /home partition separate never ended up being useful to me.
Running out of space in either was always a "have to add/change disk" scenario in the end.

OTOH having LVM has been hugely beneficial a couple of times to move stuff to a new disk by joining a new disk into the LVM volume and migrating stuff that way. Also useful if you have a second HDD slot and you do not replace but just grow the filesystem to span the second disk.
I've done both multiple times even with underlying mdraid arrays.

Btrfs

  • One big pool, but /home can be saved+reused. (Installer provides a unique exception for Btrfs: instead of requiring a reformat, it requires only that / be placed on a new subvolume.)
  • Provides for the same use case as LVM pvmove, via either btrfs replace or btrfs device add/remove. @simo
  • Anaconda and Fedora kernel teams have been clear that Btrfs is not a priority, that is to say they don't have the resources to do heavy lifting. We need to assess community interest and available resources, possibly also a call for recruitment.

Stratis

  • Isn't yet supported as rootfs (planned), and thus no installer support.
  • Based on dm-thin and XFS, with a pool of storage that is distinct from filesystems.
  • The pool's total/free/used space can show reality; whereas filesystems are virtual sized. Currently filesystems start at 1T, with no size option (developers are considering alternatives). The file system grows automatically based on demand; and by using discards, unused file system blocks are returned to the pool for use by any file system. It avoids the need to shrink/repartition home and root filesystems, but it complicates space reporting.
  • File systems and snapshots are rather expensive due to the XFS journal. A 1T file system has a 550M journal. Btrfs snapshots are ~16KiB. (Neither value includes the contents of the snapshot, this is just the "infrastructure" cost of the file system/snapshot.)
  • Currently it doesn't support shrinking or relocating the Stratis pool itself (once a device is added to the pool, that's it; planned feature but I don't know the time frame).

In particular, running out of space in / tends to be non-recoverable for nontechnical users

Can somebody elaborate on this more (or link to a comment that talks about it)? Some space is always reserved for root, so the system should boot fine. If the user can't log in because GNOME can't create necessary files, how is it then different from having a fully occupied /home?

One operation which came to my mind is dnf update which can fail during the installation in case of not enough space.

One operation which came to my mind is dnf update which can fail during the installation in case of not enough space.

The default reserved space is 5%, so that would be 12.5GB on a 250GB disk. It's very unlikely to see such a large dnf update. Also, the disk size is checked in advance, so tools like gnome-software should (and I think will, but I'm not sure how readable the error message is) inform the user and she can remove some files (if she can log in in the first place, of course). The reserved space percentage is tunable.

But when @catanzaro spoke about "non-recoverable for nontechnical users", I assumed it meant some immediate situation, not a potential future dnf update.

Honestly, I don't know how well full /home is handled because I don't experience that case very often. I think reserving space for root user doesn't help because normal applications need to create files too, e.g. under /var or wherever.

When my / fills up, applications just crash until I make some space under /, which nontechnical users are not going to be able to do. We display a notification to free space using baobab, but baobab is frankly nonfunctional trash that we need to drop ASAP; I don't think I've ever seen it work properly without errors. The plan is to replace it with GNOME Usage. Once that happens, users should at least have a chance to recover by deleting files, assuming that they're able to launch Usage without it crashing. Still, this scenario is definitely not handled well either.

Re: reserved space

The default partition scheme has /usr /etc /var on vg/root ext4. The reserved space is not a factor because everything written there is a privileged process anyway. Whereas on /home, the reserved space is actually unusable. It's probably useful in a "one big ext4" file system layout though. Also, this is a uniquely ext4 feature, Btrfs and XFS have no such concept as reserved space for root user.

There needs to be a reliable low water mark notification by the DE before there are problems. It either needs to check periodically, or add new function to fanotify.

Re: yet another layout idea

Following from the 2nd to last paragraph in this comment

With some additional work it might also be sane to stuff the real /etc and /var on this encrypted volume and bind mount them at login. Call it the 'etcvarhome' volume, it'd be LUKS encrypted. Users share that LUKS volume, and user homes are just plain directories.

Boot uses generic /etc /var since the goal is to get to a login window as fast as possible. The startup of many (most?) system services would be delayed, since real /etc /var aren't yet available. Login unlocks 'etcvarhome', now everything is assembled, startup resumes, and the desktop appears.

With even more work, the generic boot can more easily made a (signed) dmverity image; or with a lot less work but more committment it could be a (signed) Btrfs image that handles updates/upgrades the same as now since it'd behave as a conventional read-write file system. Yet it still supports a stateless/resettable systems via seed/sprout feature. The main idea is to get the stateful directories /etc /var /home both encrypted and on their own filesystem volume, so they can share a big pool. Whereas /usr doesn't have such a huge range associated with it; and less so the more flatpaks are used.

I'm also accounting for CoreOS+Silverblue usage. While rpm-ostree can reset its own state (wipe out local configuration), it can't reset file system state. This idea makes it possible. And it helps with our flatpak --system vs --user space consumption problem. They'd be on the 'etcvarhome' file system in any case. The only distinction is whether some app installs should be private or unprivileged (I still kinda like /home/shared/apps to share unprivileged flatpak installs).

TPM support isn't strictly needed. The two things needing TPM: the HMAC for authenticated Btrfs (or dm-integrity); and the HMAC for authenticated+encrypted swap for supporting hibernation with UEFI Secure Boot. The key for dm-verity images is considered public knowledge, often placed on the command line or stuffed in an initramfs, because the image is read-only and can't be modified. Whereas the Btrfs (or dm-integrity) HMAC key needs to be kept secret because anyone who has it can mount the filesystem read-write and modify it.

Anyway, this idea suggests some unconventional rearrangments, using a signed boot image for trust, encrypting all the areas we care about, without any loop files or file system resizing gymnastics.

Btrfs
[...]
Anaconda and Fedora kernel teams have been clear that Btrfs is not a priority, that is to say they don't have the resources to do heavy lifting. We need to assess community interest and available resources, possibly also a call for recruitment.

I have already been doing all the "heavy lifting" for Btrfs in the Fedora ecosystem for the past three years, as it were. Anaconda in Fedora 32 will let you produce a full Btrfs-only filesystem, and I've had a feature proposal in development to revamp the default Btrfs setup for Fedora 33.

I believe I can also get additional resources to assist with supporting Btrfs in Fedora if this is a path we want to go down. I firmly believe that we should consider switching to Btrfs by default in Fedora Workstation.

I've had a feature proposal in development to revamp the default Btrfs setup for Fedora 33.

@ngompa I'll be happy to collaborate on implementing this. At installation time our needs might not be identical (at work we need to enforce full-disk encryption, so we need Btrfs on top of LUKS) but we definitely need the same userspace integration with Snapper etc.

Before making btrfs a default you should get buy in from some kernel folks that will handle any bugs in it when used as a desktop.
My understanding is that the current developers are focused only on server root partition, largely read only use cases, not "important data" use cases.

My understanding is that the current developers are focused only on server root partition, largely read only use cases, not "important data" use cases.

That's a completely different message from what we've been hearing. What's going on here?

It's quite possible different developers have different focus and we heard from different voices.

What I do know is that both SUSE and Facebook are working on Btrfs for all of these cases. Facebook heavily leans toward the "important data" case, while SUSE leans toward the "root partition" case. In general, though, they are working on the same problems upstream and care quite heavily about both server and desktop cases.

And it's important to note that openSUSE has been running with Btrfs as the filesystem of choice for everything by default since the end of 2014. And given @salimma's interest here, I'm fairly certain his team can help support us on the desktop cases within Fedora too.

What I do know is that both SUSE and Facebook are working on Btrfs for all of these cases. Facebook heavily leans toward the "important data" case, while SUSE leans toward the "root partition" case. In general, though, they are working on the same problems upstream and care quite heavily about both server and desktop cases.

Interesting, my understanding of Facebook's priorities is the exact opposite of yours ...

That said how does btrfs fares on top of dm-crypt devices ?

Facebook runs btfs both on data drives and root drives. Right now a good chunk of our fleet is already running btrfs on /, and we plan to have the vast majority of it on btrfs by the end of the migration to centos 8 / stream (where we've made btrfs the default for /). The desktop team is also running btrfs (on top of luks for encryption) on our Fedora laptop deployments.

Interesting, my understanding of Facebook's priorities is the exact opposite of yours ...
That said how does btrfs fares on top of dm-crypt devices ?

It does as well as anything else does on dm-crypt. It's a bit more painful because you can't use btrfs' volume management pieces well if you're relying on devicemapper for your volumes. There's some work going on upstream to introduce native encryption into Btrfs alongside the native authentication/integrity work going on now.

My understanding is that Synology does it this way (LVM/DM, LUKS, dm-crypt, etc.) on their NASes as well for their secured data features. So it's a known method of operation.

Btrfs encryption
Native encryption is being worked on but there's no concrete time line. For (cgroupvs2) resource control, a full implementation requires IO isolation, and that's only possible with Btrfs. And only directly on the drives. Using dm-crypt will prevent full IO isolation. But we'd still see important and necessary resource control improvements due to @benzea 's work in Shell (and others in KDE) leveraging systemd cgroupsv2 functionality. A possibly interesting consequence of Btrfs by default, is revealing the relative difference and importance of IO isolation, and may push development elsewhere to catch up with Btrfs in this area.

In Fedora, /boot on ext4 is still required when sysroot is on LUKS.

Fedora community resources - kernel
Josef Bacik and I are on the fedora-kernel-btrfs@ bugzilla list, which meets the minimum requirements for Btrfs being a release blocking file system. Anything I can't triage, I take to linux-btrfs@ (the upstream list) and cc: Josef on directly. I know the Btrfs maintainers, they know me. But I don't scale and I'm not a kernel developer. There needs to be clear understanding what additional requirements and constraints there may be for Btrfs to be a default file system.

Fedora community resources - anaconda
In that same thread from last August, Anaconda team oppose Btrfs even being a release blocking file system. It's uncertain to what degree Anaconda (both team and the code) are willing and able to accept Btrfs as not only release blocking, but as a default file system, and under exactly what parameters. Anaconda has substantial Btrfs support. It's perhaps the most Btrfs aware and sophisticated installer, which raise near term planning questions and long term maintainability plans.

Scope - grandiose vs narrow
Btrfs can be a drop-in replacement for ext4+LVM. It has surprisingly few features that must be decided at mkfs time, or even decided at all. Whereas a more expansive implementation will impact the installer, QA test cases, documentation, and users quite a lot more.

Just to add some extra clarification and background, Facebook uses btrfs for literally everything.

It's on all of our root partitions.

We use send/receive to ship containers around, each container is thus in its own subvolume.

Our build servers use btrfs extensively, snapshotting the source tree, applying patches, running tests, destroying snapshot, rinse and repeat thousands of times an hour across thousands of machines.

Whatsapp is moving their offline storage to utilize btrfs, these are multi-terabyte file systems.

Some of our gluster clusters use btrfs for their backing store.

Facebook uses every single feature btrfs has to offer heavily, across a very large number of hosts. And a majority of these use cases were implemented without the kernel teams direct involvement. People simply tried it, it worked well, and they deployed it into production. Have there been problems? Of course. But at our scale we simply cannot afford to deploy technologies that have systemic issues, and btrfs has been solid across the board.

The desktop team is also running btrfs (on top of luks for encryption) on our [Facebook's] Fedora laptop deployments.

minor clarification: we're planning on doing it, and some of our users are running btrfs, but it's not the default yet. We plan to switch soon though, and hopefully contribute any necessary change to Anaconda.

I'm not aware of any desktop-specific issue that would be unique though, considering all the Btrfs deployments we have on servers.

@catanzaro @otaylor I haven't previously seen 'staffed support' stated as a requirement, until today. What does it mean exactly? Josef filled that staffed position up until 2012. But in 2014 the Workstation working group produced a technical spec that calls for Btrfs by default when it is ready, and FESCo approved that doc. The point is not revisiting stale history, but note the sequence: the working group and FESCo approved Btrfs by default knowing there was no staffed support, therefore is 'staffed support' a hard requirement? Or a preference?

@mclasen Why btrfs? It solves this issue's top complaint: root running out of space while leaving home with tons (or vice versa). One big plain ext4 volume solves this too, but as a consequence /home cannot be preserved during a reinstall. For Silverblue it means /var can't be saved. Whereas with Btrfs, Anaconda allows var and home subvolumes to be reused during a clean install. No other (technical) changes needed. Advanced users who like LVM for pvmove, vgextend, and snapshot operations can still do those things with Btrfs. I think it's fair to say considering any new default means evaluating the why, the benefits, and integration expectations/requirements and alternatives. That would be essential before a decision could happen. What prerequisite questions need answering to decide whether to consider it?

Skimming Open Decision Framework it seems like a useful starting point for a process.

Tangential tech note: few things must be decided at mkfs.btrfs time, quite a lot can easily and safely be changed after-the-fact. For example Ignition, and systemd-repart (also new in 245) could learn a few new tricks, making that work useful for other editions/spins, Silverblue, and even as it relates to doing upgrades and reprovisioning? In other words, maybe don't assume all things need to be in Anaconda or even the desktop.

Consideration: I think if the WG were to consider Btrfs by default that necessarily the process will draw interested parties to help present pros, cons, requirements, constraints, and prioritize issues. And to have ownership of the decision. If it turns out the WG has to do a bunch of heavy lifting to push this over the bump, then that might indicate there isn't enough community interest to support it.

It very well could turn out Workstation WG declines Btrfs by default, but the evaluation might reveal there's a better fit for some other Fedora edition or spin.

Tangential tech note: few things must be decided at mkfs.btrfs time, quite a lot can easily and safely be changed after-the-fact. For example Ignition, and systemd-repart (also new in 245) could learn a few new tricks, making that work useful for other editions/spins, Silverblue, and even as it relates to doing upgrades and reprovisioning? In other words, maybe don't assume all things need to be in Anaconda or even the desktop.

If there will be someone from community who is willing to provide patches to Anaconda and is willing to maintain given feature then we don't have objections in general.

We are happy to help and take the maintainance with features which are related to our "priority list" but other than that we have this:
https://anaconda-installer.readthedocs.io/en/latest/contributing.html#pure-community-features
The section can be bigger if required.

Just saying ;).

Of course I can't talk on behalf of our storage library blivet. They could have different view.

I haven't previously seen 'staffed support' stated as a requirement, until today. What does it mean exactly? Josef filled that staffed position up until 2012. But in 2014 the Workstation working group produced a technical spec that calls for Btrfs by default when it is ready, and FESCo approved that doc. The point is not revisiting stale history, but note the sequence: the working group and FESCo approved Btrfs by default knowing there was no staffed support, therefore is 'staffed support' a hard requirement? Or a preference?

I would say it's a "consideration." Certainly it's something to be considered. It would be unfortunate if we wind up with a btrfs-related blocker bug and nobody at Red Hat has the required expertise to solve the issue; help from Facebook and other community members is much appreciated, but when it comes to potential release slips or urgent data loss bugs, relying on community support is not ideal.

On the other hand, I understand (or expect) that btrfs-related bug reports are relatively rare (yes?), because you wouldn't be proposing it if it was buggy. I also know that the Fedora community can pull off big changes and maintain critical components. This disk usage problem sucks, and If btrfs is the best way to solve that, I'm interested. Moreover, I suspect that Red Hat's interest in btrfs is likely to increase if Fedora starts using it by default. And yes, btrfs by default has been part of our tech spec from day one, so I'd sooner think of it as a long-since approved plan that's taken a long time to reach fruition, rather than anything surprising or unexpected.

I would say it's a "consideration." Certainly it's something to be considered. It would be unfortunate if we wind up with a btrfs-related blocker bug and nobody at Red Hat has the required expertise to solve the issue; help from Facebook and other community members is much appreciated, but when it comes to potential release slips or urgent data loss bugs, relying on community support is not ideal.

We rely on community support for potential release slips in a lot of core things. For example: @ignatenkobrain and myself are the maintainers of pkgconf, and there are no Red Hat employees helping in Fedora with that today. A breakage there results in virtually every single package no longer building. Fortunately, that hasn't happened and will never happen as long as I can help it.

Also to note: Red Hat being "on the hook" does not guarantee help. GCC maintainers always say that they won't help everyone with bugs during GCC rebases. I've been lucky and most of the time I can get some, but they actively state to not expect help with breakages as a result of a GCC rebase.

Bugs in the filesystems are a different level, for sure, but when you have people who are committed to maintaining, enhancing, and fixing the code, that's the best you can ask for. Even with Red Hat folks, we have no "hotline" as it were Fedora side to make them do anything. So having support from Red Hat is not only not a valid consideration in my book if someone else is willing to do it, it is actively harmful to improving the quality of participation from the community.

when it comes to potential release slips or urgent data loss bugs, relying on community support is not ideal.

I don't know whether any Btrfs kernel developers are looking to move, any more than ext4 or XFS devs. But yeah it can't be causing slips or data loss. Support needs to be consistent and timely.

On the other hand, I understand (or expect) that btrfs-related bug reports are relatively rare (yes?), because you wouldn't be proposing it if it was buggy.

If it were buggy, I wouldn't use it, wouldn't recommend it, and wouldn't bother considering it by default. Btrfs fails differently than other file systems. Simplistically, it's maybe harder to break and harder to fix, compared to other file systems. And it will complain sooner when detecting problems. Error reporting can be esoteric. I can elaborated on and qualify that, with examples - so everyone can make their own assessment.

BTW to motivate the need for unified partitioning: I just ran out of space on my 75 GB root filesystem, and I actually have all my flatpaks installed in my home directory to avoid this. Problem is /var/lib/mock can also grow huge.

BTW to motivate the need for unified partitioning: I just ran out of space on my 75 GB root filesystem, and I actually have all my flatpaks installed in my home directory to avoid this. Problem is /var/lib/mock can also grow huge.

I always configure mock to use /home/mock/var/lib/mock for this exact reason.

FWIW, after reading this entire issue I think it is about time Fedora moves to btrfs by default.

I always configure mock to use /home/mock/var/lib/mock for this exact reason.

I'm using tmpfs plugin if someone has enough RAM and want to get a nice speed boost ;).

Sorry for the OT.

Resize root on default installation of Fedora 32 using F32 install media and blivet-gui

Setup:
image: Fedora-Workstation-Live-x86_64-32-1.6.iso
virtual drive: 1TB
Clean install using Automatic partitioning (75G root, ~950G home)

  1. Boot Live OS
  2. GNOME Software search for 'blivet' and install blivet-gui
  3. Launch blivet-gui
  4. Left side UI, choose LVM: fedora_localhost-live (name of VG)
  5. Main UI, Logical view, select the "fedora_localhost-live-home" LV
  6. Click the "Edit selected device" button (3rd from left), and choose "Resize"
  7. Choose a new size for home. Suggestion: reduce by 10-15G. e.g. if the current size of home is 949.4 GiB, change it to 939.4 GiB and click Resize
  8. Select the "fedora_localhost-live-root" LV
  9. Click the "Edit selected device" button (3rd from left), and choose "Resize"
  10. Choose a new size for root. Suggestion: use the slider control, move it to the far right to use all the free space creation in previous steps.
  11. Near the top of the blivet-gui UI is a checkbox, check this to complete the changes.
  12. Review the proposed changes, click OK when ready to commit them to disk. Make sure this is not interrupted!

Yes, this is a bit clunky (both the directions that use jargon, and the experience of going through the steps). But it does work. This could be adapted for #121 to help the user add a swap LV, if they later decide they want to enable hibernation. It's about 6 more steps.

Michael's summary is more complete, but I'll restate it as "I have all this free space over there, but I need it over here. Now what?"

I figure it's not obvious how Btrfs can solve that.

One common bit of Btrfs jargon: subvolumes, needs an explanation. Volumes have size. So, subvolumes have size, right? Actually, they don't. They are pretty much like directories. And like directories they have no size of their own. Subvolumes and directories share file system space. In this sense Btrfs is like the "one big ext4 volume" idea.

Subvolumes have some unique features that directories don't. They can be mounted independently using e.g. mount -o subvol=chris /home/chris but this is just a bind mount behind the scenes. Not exotic. This is what Anaconda leverages when reusing home or var. And in this sense a subvolume is might seem like a separate file system.

More of the above? Explain one Btrfs thing, in two paragraphs?

Yeah, short and sweet explanations of what specific problems btrfs solves for Workstation seem like a good way to make the case for it. More like that seems good.

That said, having read the above, I wonder: what is the main advantage of btrfs subvolumes over just a vanilla single-partition / on ext4?

That said, having read the above, I wonder: what is the main advantage of btrfs subvolumes over just a vanilla single-partition / on ext4?

Home reuse, first bullet point here. it's a use case that's been brought up a few times, also by Anaconda folks, even though the clean install doesn't happen often. On ext4, the user will have to do a backup and restore.

Others I use frequently:

Subvolumes can be snapshot. A snapshot is a pre-populated subvolume. I use this before every update, and upgrade, and rollback if needed.

Snapshots can be "sent" to other Btrfs file systems. Incremental send is substantially faster than rsync, due to not needing deep traversal for comparisons. I use it for backup and restore.

The same Anaconda 'reformat not required' behavior lets me install multiple Fedoras on the same Btrfs file system, into their own root subvolume. e.g. Fedora 32 and Rawhide.

Podman can use btrfs subvolumes/snapshots for containers instead of overlayfs. If both are available, there might be optimizations that take advantage of both. It's not my area of focus, but I know they're compatible.

I also imagine uses complementary to rpm-ostree. Where it handles root tree versioning using a similar concept (different implementation of course), it's expressly not responsible for doing snapshots or rollbacks of /etc /var or /home (or ~/). I see ways Btrfs can make Workstation and Silverblue seem more alike, but also extend Silverblue in ways it can't do with rpm-ostree alone.

That's most of what's up with subvolumes.

So to put it simply, four major advantages for Btrfs right now are:

  1. "Lightweight" partitioning via subvolumes. Subvolumes, unlike real partitions, do not mandate storage allocation splits. Instead, they are merely "buckets" of content that are tracked independently on the same filesystem. This means that your resource usage can be a lot more dynamic. For example, if you build lots of packages like I do, /var/lib/mock takes more space than /home/ngompa does. But you have effectively free use of all the space, regardless of your setup. The subvolumes let us configure each "bucket" separately for things like compression, snapshotting, etc. You could also choose to impose quotas to restrict subvolumes on how much space they can use, like how partitions can, but you can change them later in a completely non-destructive manner.

  2. Cheap storage expansion to get out of running out of space. Btrfs has features similar to LVM in that it actually manages disk resources directly and can configure them as part of the filesystem (this is what makes Btrfs a so-called "volume-managing filesystem"). In the event that you run out of space, you are basically screwed on a regular file system like ext4 or XFS and will have to get a new, bigger disk and move everything by hand. With Btrfs, you can add another disk and tell Btrfs to expand the filesystem over to the other disk. This instantly gives you much more free space to work with, and lets you continue working without having to take a day to figure out some kind of strategy to delete and move files around so you can get enough space to keep doing work.

  3. Nearly zero-cost snapshotting. While snapshots are not a replacement for backups, it does come in handy for a lot of use-cases to be able to do OS data snapshotting, configuration snapshotting, or user data snapshotting. All of those are able to be independently managed and can help in specific recovery scenarios. These features are also helpful with containers, as Podman and other similar tools can write container images as Btrfs subvolumes and instantiate them as snapshots, making it very easy to do complex layering without all the problems that occur with OverlayFS and without the performance penalties of device-mapper. For example, random filesystem integrity errors and rpmdb corruption inside of containers simply doesn't happen on Btrfs like it does with OverlayFS.

  4. Cheap offsite backup/restore. Remember when I said snapshots aren't a replacement for backups? Well, Btrfs has the ability to send subvolumes or snapshots to other Btrfs filesystems, and that can happen over a network too. It's a much more efficient form of mass data transfer, and can seriously help with doing data migration or disaster recovery so much more easily. Also, remember back when I talked about expanding storage easily? You can also use this feature to send the whole Btrfs volume to a new disk if you wanted to move to a singular larger disk. Or have some kind of archival disk to send to and then snapshot on the archival disk. :smile:

Home reuse, first bullet point here. it's a use case that's been brought up a few times, also by Anaconda folks, even though the clean install doesn't happen often. On ext4, the user will have to do a backup and restore.

Just a note. While completely true, this whole "you can't keep your /home on a fresh install if you don't have it on a separate partition" feels like a very artificial anaconda limitation that could be lifted easily, if people wanted. There's nothing difficult on removing all files except /home from a filesystem and then using it as the installation target.

Just to be devils advocate, features 1-3 Neal mentioned above seem to be achievable with LVM thin Provisioning and XFS volumes too...
As for 4 I am not sure it matters a lot for the general case.
It requires another system with a compatible version of btrfs and enough space to contain the live filesystem .... that is not what typically users have available for backup anyway (if you are lucky they have a manged NAS with an opaque filesystem and just send tarballs or rsync there).

I also agree with kparal that the "separate volume is needed for re-installs" is basically a red herring.

@vponcova @kparal

Just a note. While completely true, this whole "you can't keep your /home on a fresh install if you don't have it on a separate partition" feels like a very artificial anaconda limitation that could be lifted easily, if people wanted. There's nothing difficult on removing all files except /home from a filesystem and then using it as the installation target.

Vendula addresses this up-thread, here. I can't find any single anaconda-devel email that explains all concerns, but consistently in Anaconda development the filesystem for / needs to be (a) empty (b) consistent and (c) up to date on-disk format. My recollection is Anaconda folks don't want to get into the business of deleting directories. It adds complexity and non-zero risk. There are many changes in ext4 and XFS that have become default features over the years, that require new mkfs to get. Btrfs uses feature flags for this instead, so a reformat isn't needed to get a new default feature. If anything, the oddball is the Btrfs exception, rather than the imposition of required reformat on ext4/xfs. Btrfs gets an exception because of subvolumes providing a new clean file tree, and therefore doesn't ask Anaconda to get into the business of deleting things. In effect Btrfs and Anaconda give each other a wide berth.

@simo

Just to be devils advocate, features 1-3 Neal mentioned above seem to be achievable with LVM thin Provisioning and XFS volumes too...

I'm happy to go fully down this rabbit hole. But the tl;dr? I consider the integration challenges on desktops to be impractical to overcome. Btrfs has integration opportunities, I've so far discovered nothing required. But LVM thinp would have integration requirements related to used and free space reporting (which is a total fantasy on thinp).

Also, it doesn't prevent Michael's problem, he still has to resize the fs manually. And without benefit of GUI tools. I don't see it as an improvement at all, it's just more confusing.

Thinp snapshots aren't cheap because they pin the file system journal. e.g. 1T XFS has a 550M journal. Each new snapshot costs 550M. Might be solvable, not sure.

As for 4 I am not sure it matters a lot for the general case.

And it's not required. They can use rsync or whatever else they want, just as they have been.

However, people don't use what they don't know about, or have access to. I used rsync before I used btrfs send/receive. The concept of btrfs send/receive is not as complicated as rsync.

It requires another system with a compatible version of btrfs and ...

I don't follow this. There aren't versions of btrfs. I use send/receive between Fedora and Arch, using different kernel and btrfs-progs versions. There might be confusion with (new) selinux labels that are valid on the send side, but don't exist on the receive side.

enough space to contain the live filesystem ....

I don't follow this either. Btrfs send does not send the full file system. The contents of the snapshot is all that's sent.

I also agree with kparal that the "separate volume is needed for re-installs" is basically a red herring.

A red herring is a distraction. The 'empty volume' needed for / is an actual present day, and long standing behavior of Anaconda, and has real consequences on clean install UX. That makes it definitely not a distraction. It needs to be considered.

It's fair to ask Anaconda folks for an better explanation than I've given, maybe even re-evaluation.

A way I like to think of FOSS is a bit like "centers of gravity" where different ecosystems interact by pulling on each other in different ways. Fedora and RHEL orbit each other in a way, and are connected to other distibutions, language communities. The individual people involved often create their own gravity and try to pull on those larger ecosystems.

And I'm glad we have the BTRFS cheerleading squad here trying to push on the future orbit of Fedora Workstation ...but...two points:

First, Fedora interacts with RHEL and the decisions even in sub-components of Fedora have wide-ranging impact; I think Fedora can be different but it'd be difficult to imagine this working group making such a consequential decision without some sort of approval or at least ack from the RHEL side.

And yes, sometimes this can be frustrating because it can create a kind of deadlock where Fedora blocks on RHEL and RHEL blocks on Fedora.

So in my opinion rather than trying to change the default, what would be a lot more effective is continuing the work in e.g. Anaconda to support it as an option, write Fedora Magazine articles about how to use it, the benefits etc.

The second related point is that those BTRFS features come at a cost, and people end up using nodatacow for things like VMs and the systemd journal, which complicates a lot of the story around the benefit of snapshots. I believe this type of thing is why OpenSUSE ended up (...looking at the current installer) defaulting to a nodatacow /var.
And so the larger point here isn't "BTRFS by default" - I think we have always supported a variety of partitioning setups, just be sure that Anaconda offers a range of sane setups for BTRFS that you think work well. Making one the default is then a whole other story.

So, we in the Anaconda discussed the possibility of removing content and not require to reformat of '/' volume. Result of this discussion is that we don't want to support this solution. It's not a trivial change and pretty fragile.

However, we don't want to block your initiative. If someone is willing to create and maintain this, then you are free to create an Anaconda addon for this. The addon would just backup the home and mark the backup device as protected. After the installation is done, it would copy the backup back.

Also, I was thinking about this issue. The original problem of this ticket is that flatpaks are taking too much space, why not to change the behavior of Software center to install flatpaks to '/home' instead of '/var' ?.

Also, I was thinking about this issue. The original problem of this ticket is that flatpaks are taking too much space, why not to change the behavior of Software center to install flatpaks to '/home' instead of '/var' ?.

I don't think that it's just Flatpaks. I never had problems due to Flatpaks, but I was affected by this by using mock. I constantly had to clean /var/lib/mock to be able to do my work (RHEL/Fedora maintainer - so using several chroots across various RHEL/Fedora releases).

Also, I was thinking about this issue. The original problem of this ticket is that flatpaks are taking too much space, why not to change the behavior of Software center to install flatpaks to '/home' instead of '/var' ?.

The real issue is that free space is not shared between / and /home. Flatpak issue is just one of the symptoms.

Even cleaning /var/lib/mock is only enough for me to do small mockbuilds, it doesn't free enough space on my 75 GB / partition in order to successfully do large mockbuilds. For these, I submit unnecessary scratch-builds and check back later to see how they failed. I hope we agree the default is poor and needs fixed. Solutions are (a) btrfs or (b) single ext4 / partition without a separate /home partition.

Can we agree that we need to implement one of (a) or (b)? Any alternative proposals?

The real issue is that free space is not shared between / and /home. Flatpak issue is just one of the symptoms.

BTW, it's important to bear in mind that for ostree-based systems it's /var/home - as a system administrator, all of "your data" is in /var and so it works really well to make /var your one big separate partition distinct from /. (I've been thinking lately an incremental step we could take is switching that even for non-ostree based systems)

@walters

First, Fedora interacts with RHEL and the decisions even in sub-components of Fedora have wide-ranging impact; I think Fedora can be different but it'd be difficult to imagine this working group making such a consequential decision without some sort of approval or at least ack from the RHEL side.

Definitely Fedora does not need any approval or ack from RHEL side. However, we need to make sure that those people are aware of such change (that's what change process is about). RHEL already uses xfs by default while Fedora using ext4. World did not explode.

If the change has benefits for Fedora and does not do any harm to RHEL (in this case it does not because we are not throwing away xfs support or anything like that), there is nothing wrong with the change itself. Fedora should not be "free, newer (bleeding edge) RHEL". In opposite, we should innovate in Fedora and RHEL can either accept those changes or if there is some political or any other problem with some decisions, it will have to be different in some ways from Fedora. It is not good and not bad.

And yes, sometimes this can be frustrating because it can create a kind of deadlock where Fedora blocks on RHEL and RHEL blocks on Fedora.

RHEL should not block Fedora. Full stop.

There might be some compromises that could be done in some areas, but not in the way "block Fedora".

So in my opinion rather than trying to change the default, what would be a lot more effective is continuing the work in e.g. Anaconda to support it as an option, write Fedora Magazine articles about how to use it, the benefits etc.

Well, I suppose if Workstation WG changes defaults they will make sure to keep ext4 as an option in Anaconda. And while I agree that having articles on Fedora Magazine would be nice, it is definitely not necessary for change to become reality.

The second related point is that those BTRFS features come at a cost, and people end up using nodatacow for things like VMs and the systemd journal, which complicates a lot of the story around the benefit of snapshots. I believe this type of thing is why OpenSUSE ended up (...looking at the current installer) defaulting to a nodatacow /var.

From what I have been told: This only affects databases and even there the performance drop is negligible on SSDs (we are in 2020). But this does not apply for Workstation because not many people run databases there, especially high-performing ones.

Regarding partitioning, I would have / and /home as subvolumes and then after we move /var/lib/rpm to some other place, we could add /var as subvolume too. Of course, I did not deep dive into this topic so it is based just on my opinions (that are based on some experience and conversations with different people).

And so the larger point here isn't "BTRFS by default" - I think we have always supported a variety of partitioning setups, just be sure that Anaconda offers a range of sane setups for BTRFS that you think work well. Making one the default is then a whole other story.

I have to disagree here because (I believe) Workstation WG wants to have "standard desktop use-case" work best. If for 90% of audience, new partitioning schema will make Fedora work better and make it worse for the rest I would go for it. This all is about balance.

Hello @catanzaro @mclasen @aday @ngompa @kalev @chrismurphy @petersen @tpopela @otaylor @langdon once more. I have been approached by several members of Fedora community asking me how to proceed further with this.

With my FESCo hat on, I would appreciate much if you could state here your technical arguments against "BTRFS as default on Fedora Workstation" (not the "this needs to be same as in RHEL"). If there is nothing strong against this, anybody who vote for is welcomed to submit a change proposal so that it gets some wider discussion (in devel@) and then it would be voted by FESCo.

Thanks and have a nice weekend :roller_coaster: !

Based on the discussion in this thread, I think we should switch.

RHEL already uses xfs by default while Fedora using ext4.

Note while ext4 is the Anaconda default (for whatever reason), Fedora Server uses XFS as an explicit choice to be closer to RHEL, and Fedora CoreOS uses XFS for a similar reason but also because reflinks are good for containers (w/overlayfs).

Saying "Fedora uses ext4" is hence not at all accurate. It is accurate to say Fedora Workstation uses ext4 though.

Finally I'll just note that ext4 and xfs are explicitly supported filesystems in RHEL, btrfs isn't. Now I would 100% support the statement that Fedora should not block on RHEL, otherwise we have gridlock. But this is a pretty profound change to be suggesting without (to my knowledge) anyone from the RHEL storage side even involved at all.

And related to the above, Workstation isn't isolated from the rest of Fedora either; it relates to other Fedora projects too such as Server and CoreOS mentioned above. (Also Silverblue which currently ostree doesn't quite work with BTRFS as / but I'd like to support it)

@walters

Saying "Fedora uses ext4" is hence not at all accurate. It is accurate to say Fedora Workstation uses ext4 though.

Sure. Since we are in workstation tracker I was somewhat implying this.

Finally I'll just note that ext4 and xfs are explicitly supported filesystems in RHEL, btrfs isn't. Now I would 100% support the statement that Fedora should not block on RHEL, otherwise we have gridlock. But this is a pretty profound change to be suggesting without (to my knowledge) anyone from the RHEL storage side even involved at all.

This is something what should naturally happen when change proposal is submitted. It is announced on devel@ and relevant people have a chance to speak up. For example, I don't even know who "RHEL storage side" is.

And related to the above, Workstation isn't isolated from the rest of Fedora either; it relates to other Fedora projects too such as Server and CoreOS mentioned above. (Also Silverblue which currently ostree doesn't quite work with BTRFS as / but I'd like to support it)

Those are free to choose any other filesystem.

I am against switching to a file system that we don't have people working on. ext4 and xfs would both be fine; I don't think it's time to switch to btrfs by default.

Data safety is paramount and we should be using the best tested file system here.

I've been burned enough times by using experimental file systems and not interested in supporting a proposal to switch to a fringe file system here.

@kalev

we don't have people working on

Fedora is not an employer. It is an open-source project that has many different contributors, including people from Red Hat. There are only few several roles that are officially being paid (committed) by RH and that does not include maintainers and such.

So as long as btrfs is developed by somebody else, you can't claim that there is nobody out of there who is working on btrfs. If you would say that it is developed and used by very limited amount of people, I have to point you to SUSE and Facebook, they are using btrfs almost everywhere. openSUSE uses it from 2014.

Data safety is paramount and we should be using the best tested file system here.

I don't disagree, but

by using experimental file systems

It is not experimental since 2013.

fringe file system

I was asking for technical arguments.

So as long as btrfs is developed by somebody else, you can't claim that there is nobody out of there who is working on btrfs.

Where did I claim that? I think it's pretty obvious there are people out there working on it.

I was asking for technical arguments.

Okay dude. If you haven't noticed, this isn't "ignatenkobrain asking other people questions" ticket system, it's Workstation WG. You are welcome to provide your opinion here in the ticket but not start telling other people to go away when they don't say what you don't want to hear.

Okay dude. If you haven't noticed, this isn't "ignatenkobrain asking other people questions" ticket system, it's Workstation WG. You are welcome to provide your opinion here in the ticket but not start telling other people to go away when they don't say what you don't want to hear.

Sorry, I did not mean to offend anybody. I merely try to help in resolution of this ticket. It is basically opened for 2 years and discussions are circling in rounds. just repeating the same thing about supportability is not helping in this as you can see, so that's the reason why I wanted to hear technical arguments that could be used for and against when this would be submitted as a change proposal.

Sorry for keeping quiet here too long.
First of all here are my partitions on my Silverblue system:

$ sudo df -hl --output='source','fstype','size','avail','target' | grep -v tmpfs
Filesystem              Type      Size Avail Mounted on
/dev/mapper/fedora-root ext4      226G   13G /sysroot
/dev/nvme0n1p1          ext4      976M  753M /boot

I think don't think this is particularly special but I could not get anaconda to give me this despite multiple attempts, so in the end I edited it manually using these steps:
https://twitter.com/juhp/status/1119592530121674754
It was stressful but worked perfectly. (13GB of free space is about high tide for me;o)

I am seeing a lot of good arguments here for moving forward with Btrfs for Workstation.

I think it would be good to start testing Btrfs in Rawhide Workstation.
So I suggest making a solid Change proposal for Workstation, organize some simple test days:
maybe one for installs, and one for a couple of simple common basic tweaks say.
This will give users and QA a good chance to try out and "stress test" Btrfs in Fedora, and gain more confidence in it. Making it easier to choose the default fs in Anaconda might be a plus.
During this same time we can still have discussion with storage experts and btrfs proponents.
We have been discussing this so long, so I suggest we put it to a vote.

I am going to try out Btrfs in my next Rawhide VM install.

I did have one technical which I didn't see covered here (though I may have overlooked it):
What would be the current story for disk encryption with Btrfs at this time in Fedora?

What would be the current story for disk encryption with Btrfs at this time in Fedora?

Anaconda is creating BTRFS on top of LUKS and that works just well.

I did have one technical which I didn't see covered here (though I may have overlooked it):
What would be the current story for disk encryption with Btrfs at this time in Fedora?

Today, Btrfs does not have native encryption, so you need to use LUKS there, which works fine. There's some upstream work going on to provide a native, transparent encryption layer with the same kind of flexibility that exists for compression, CoW, and other features.

I merely try to help in resolution of this ticket. It is basically opened for 2 years and discussions are circling in rounds.

I would still argue that this ticket could be simply fixed by switching our default partioning layout - and I'm not sure that adding "switch the filesystem" makes it easier to resolve :-)

In terms of btrfs -

RHEL is not a blocker here. The goal of Fedora Workstation is to be the best operating system for a range of users, and particularly for developers - not to be a prototype for RHEL. On the other hand, developers at Red Hat are typically a key resource for Fedora, and if Fedora is going off in a direction where we don't have that resource, we really need to understand why we are doing that, and what the plan is if problems arise. And we need to communicate the reasons and plans internally inside Red Hat as well as externally, so that there nobody is surprised after the fact.

For me, the key thing to have in order to be able to advocate for switching to btrfs is an understanding of exactly what benefits we expect for users and on what timescale. What are the 3-5 key improvements that are going to make a difference for the target user of Fedora Workstation (a developer using a laptop). What work in addition to switching the filesystem is needed to get to those improvements? Are there UI integrations that are needed to make this improvements accessible to non-expert users?

This probably sounds like a high bar - I understand we often do move to things in Fedora simply because they are newer, more actively developed, and more interesting - and that approach generally works - it keeps Fedora in a good place to take advantage of improvements as they occur. We don't always require proof that new things are better in their current form. But the default filesystem is not just any component - it's the most important component for keeping the users data safe, and if we change it, we effectively can't revert that change - users will have installed with that filesystem and won't be changing it.

I don't think the change proposal process is a great model for the process of coming to a decision - to me the change proposal process is usually used as a way for the community to comment on, and for FESCO to approve changes that the relevant developers have already decided on. If there is lack of consensus for developers, FESCO tends to just postpone decisions until there is consensus.

But I do think that having a document for changing the default filesystem for Fedora Workstation would be a good idea - the appropriate place for my "3-5 key reasons" (not in this ticket please!) - and all the other background information needed. And probably the change proposal template is as good a basis for that as any. So as far as I'm concerned, it would be fine if a change proposal was started.

(Doing this change for Fedora 33 sounds super aggressive for a timescale, but maybe steps could be identified for Fedora 33 to make doing a switch for Fedora 34 smoother and better tested.)

@walters

people end up using nodatacow for things like VMs and the systemd journal, which complicates a lot of the story around the benefit of snapshots.

How does nodatacow complicate snapshots?

... and Fedora CoreOS uses XFS for a similar reason but also because reflinks are good for containers (w/overlayfs).

How good are reflinks? I'm curious about the container use case on Workstation and Silverblue, and if reflinks are good there too? Of course, btrfs has both reflinks and snapshots for container usage; ext4 has neither.

it's important to bear in mind that for ostree-based systems it's /var/home

I mention it here, in the system encryption issue. If /var is a separate volume and is to be encrypted with a key sealed in a TPM; and user home is to be encrypted with a user passphrase (possibly via sd-homed):

  • should /var/home be a separate volume;
  • or double-encryption of user home?

Double encryption considerations here and here. Double encryption may be OK, but it needs evaluation, especially as it relates to recovery.

(Also Silverblue which currently ostree doesn't quite work with BTRFS as / but I'd like to support it)

I'm only aware of this bug at the moment. trivially fixed post-install but I'm not sure what knowledge of subvolumes rpm-ostree needs to support this; or if it's better to do flat directories.

Possible sellable feature, and a simple interface for integration might be a single check box in properties named "compression".

$ sudo compsize /var/lib/flatpak/
Processed 253606 files, 59717 regular extents (169570 refs), 168382 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       44%      2.3G         5.2G          13G       
none       100%      463M         463M         989M       
zstd        38%      1.8G         4.7G          12G       
$ 

This is configurable per file system, subvolume, directory, or per file. It could be useful for podman containers too. The write performance is limited by the internet in these cases. So why not compress them? It could be that the working group chooses specific ancillary directories for compression by default, in a curated process.

These are the bug/integration issues I've got so far. I only later noted the first two are not in fact Btrfs specific, even though they were exposed first by Btrfs. Open to ideas where to better track these, maybe a new issue or BZ? Anyway the list is kinda short so I'll just include them.

updatedb does not index /home when /home is a bind mount Also can affected rpm-ostree installations, including Silverblue.

GNOME Usage: Incorrect numbers when using multiple btrfs subvolumes This isn't btrfs specific, happens with "one big ext4" volume as well.

GNOME Boxes, RFE: create qcow2 with 'nocow' option when on Btrfs /home This is btrfs specific, and is a recommended optimization.

containers/libpod: automatically use btrfs driver if on btrfs

Metadata Update from @chrismurphy:
- Issue tagged with: meeting

2 years ago

This ticket is well beyond the 'can't read backlog' limits, and not actionabie imo. We just close it, and start over with separate tickets that have a narrow focus and can be acted on, unlike this mess.

Agreed. This issue is superseded by #152.

Metadata Update from @chrismurphy:
- Issue close_status updated to: Can't fix
- Issue status updated to: Closed (was: Open)

2 years ago

Metadata Update from @chrismurphy:
- Issue untagged with: installation, meeting

2 years ago

Login to comment on this ticket.

Metadata