We're currently planning on switching over to btrfs as the default filesystem for F33. btrfs is fairly sensitive to hardware corruption - largely because it has comprehensive checksums. In face of hardware errors, possibilities include:
We don't have any data about how often these occur across typical Fedora Workstation hardware - but one might guess somewhere between hundreds and thousands of Fedora users might experience one of the above
We should develop test cases for all of these - or at least the first two - and understand what the current experience is. What does the user see when the failure occurs? Are they guided to useful documentation?
I'm attaching an image of the current experience when the root filesystem entirely fails to mount, to motivate my concern here.
<img alt="mount-failure.png" src="/fedora-workstation/issue/raw/files/66b0c0e70f75ff740e2f4d9e15769a60e8a58dee334f482c62e6adde26d55723-mount-failure.png" />
(cc: @dcavalca, @josef, @chrismurphy)
Metadata Update from @aday: - Issue tagged with: btrfs
I believe this is something that @dcavalca and @josef are working on right now.
Metadata Update from @ngompa: - Issue assigned to dcavalca
Metadata Update from @ngompa: - Issue assigned to josef (was: dcavalca)
I'm working on making it more resilient, and @dcavalca is putting together documentation. The actual userspace stuff is going to require input from the systemd guys at least, and it would likely want to extend to the DE's, as users may miss the "HEY WE'RE BOOTING READ ONLY" message at boot time.
cc: @zbyszek
Metadata Update from @ngompa: - Issue assigned to otaylor (was: josef)
Changed to @otaylor to coordinate UX stuff with @josef, @dcavalca, etc. per meeting discussion.
One update here is that after some experimentation and discussion with Josef, it turns out that the second case where the filesystem mounts read-only basically doesn't happen. Either the filesystem entirely fails to mount, or it mounts read-write. A forced read-only device typically happens when there is an attempt to modify an already mounted device, and that would require modifying data structures that seem corrupted.
The idea of a read-only booted system is more of an option to allow the user to boot up into a familiar system, upload their most important photos to the cloud, read docs, write a USB stick, then try to recover further.
(The user doc draft says "There is work in progress to make btrfs automatically mount the filesystem as read-only when corruption or other serious issues are detected. This will make it easier to fix the problem, or copy data off if necessary. This work is slated to land by the end of July." - so perhaps this will change.)
From observation, the current state of affairs of booting onto a read-only root is that the system ends up in a netherland of 50% [FAILED] services. booting successfully with read-only file system fedora-devel. booting successfully with read-only file system. Improving this would involve either fixing services one-by-one, or just using overlayfs to simulate a read-write filesystem. devel@fedoraproject.org thread
Proposal: close this issue and open it under https://pagure.io/fedora-btrfs/project/issues, so it's no longer tracked on the WG issue tracker. Chris, is this OK?
👍 yes
Please continue discussion in https://pagure.io/fedora-btrfs/project/issue/32
Metadata Update from @catanzaro: - Issue close_status updated to: Deferred to upstream - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.