#157 User experience with corrupted filesystem
Closed: Deferred to upstream a year ago by catanzaro. Opened a year ago by otaylor.

We're currently planning on switching over to btrfs as the default filesystem for F33. btrfs is fairly sensitive to hardware corruption - largely because it has comprehensive checksums. In face of hardware errors, possibilities include:

  • The filesystem cannot be mounted at all.
  • The filesystem mounts ready-only
  • The filesystem is forced read-only at run time

We don't have any data about how often these occur across typical Fedora Workstation hardware - but one might guess somewhere between hundreds and thousands of Fedora users might experience one of the above

We should develop test cases for all of these - or at least the first two - and understand what the current experience is. What does the user see when the failure occurs? Are they guided to useful documentation?

I'm attaching an image of the current experience when the root filesystem entirely fails to mount, to motivate my concern here.

mount-failure.png


Metadata Update from @aday:
- Issue tagged with: btrfs

a year ago

I believe this is something that @dcavalca and @josef are working on right now.

Metadata Update from @ngompa:
- Issue assigned to dcavalca

a year ago

Metadata Update from @ngompa:
- Issue assigned to josef (was: dcavalca)

a year ago

I'm working on making it more resilient, and @dcavalca is putting together documentation. The actual userspace stuff is going to require input from the systemd guys at least, and it would likely want to extend to the DE's, as users may miss the "HEY WE'RE BOOTING READ ONLY" message at boot time.

Metadata Update from @ngompa:
- Issue assigned to otaylor (was: josef)

a year ago

Changed to @otaylor to coordinate UX stuff with @josef, @dcavalca, etc. per meeting discussion.

One update here is that after some experimentation and discussion with Josef, it turns out that the second case where the filesystem mounts read-only basically doesn't happen. Either the filesystem entirely fails to mount, or it mounts read-write. A forced read-only device typically happens when there is an attempt to modify an already mounted device, and that would require modifying data structures that seem corrupted.

The idea of a read-only booted system is more of an option to allow the user to boot up into a familiar system, upload their most important photos to the cloud, read docs, write a USB stick, then try to recover further.

(The user doc draft says "There is work in progress to make btrfs automatically mount the filesystem as read-only when corruption or other serious issues are detected. This will make it easier to fix the problem, or copy data off if necessary. This work is slated to land by the end of July." - so perhaps this will change.)

From observation, the current state of affairs of booting onto a read-only root is that the system ends up in a netherland of 50% [FAILED] services. booting successfully with read-only file system
fedora-devel. booting successfully with read-only file system. Improving this would involve either fixing services one-by-one, or just using overlayfs to simulate a read-write filesystem. devel@fedoraproject.org thread

Proposal: close this issue and open it under https://pagure.io/fedora-btrfs/project/issues, so it's no longer tracked on the WG issue tracker. Chris, is this OK?

Metadata Update from @catanzaro:
- Issue close_status updated to: Deferred to upstream
- Issue status updated to: Closed (was: Open)

a year ago

Login to comment on this ticket.

Metadata
Boards 1
Btrfs Status: In Progress
Attachments 1
Attached a year ago View Comment