#18 Support simple configuration of system snapshotting with full system rollback support
Opened 5 years ago by ngompa. Modified 2 months ago

Right now, it's easy enough to set up automatic snapshotting with Snapper triggered by the DNF Snapper plugin.

However, that in itself is not the complete story for this feature to be useful. Some work will be needed to align the capability with the bootloader team's efforts to unify on the Bootloader Spec.

Finally, it also needs to be straightforward to enable, as it is unlikely to be shipped active out of the box.


Talking to @dbrandonjohnson today led me down the path again on this and I remembered that we can probably use Boom for this. We need a DNF plugin to trigger snapshot creation similar to what Snapper does, though.

Brandon expressed interest in developing this within the context of CentOS Hyperscale and also bringing this into Fedora too.

See centos-sig-hyperscale/sig#111

However, that in itself is not the complete story for this feature to be useful. Some work will be needed to align the capability with the bootloader team's efforts to unify on the Bootloader Spec.

Full system restores + automatic snapshot management and boot to a specific snapshot (from Grub2 menu) is already feasible with little effort on Fedora 36.

Look at point (6) in the above link regarding some negative aspects this setup has.

  • Setting up grub-btrfs for the boot to snapshot facility should involve just this:
$ git clone https://github.com/Antynea/grub-btrfs.git
$ cd grub-btrfs
$ sudo make install

$ sudo vim /etc/default/grub-btrfs/config
---snip---
GRUB_BTRFS_SHOW_TOTAL_SNAPSHOTS_FOUND="true"
GRUB_BTRFS_GRUB_DIRNAME="/boot/grub2"
GRUB_BTRFS_MKCONFIG=/usr/sbin/grub2-mkconfig
GRUB_BTRFS_SCRIPT_CHECK=grub2-script-check
---snip---

$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
$ sudo systemctl enable --now grub-btrfs.path

This bug is probably a blocker for this feature
https://bugzilla.redhat.com/show_bug.cgi?id=2120845

Anyone who runs grub2-mkconfig on a system with BLS snippets that use rootflags=subvol=$ROOT to know which root to boot, will have this boot parameter stepped on in favor of whatever is in /etc/kernel/cmdline.

CentOS Stream bug grub2-mkconfig does not apply settings to BLS entries resulted in commit 3e40727 Skip machine ID check when updating BLS which means side by side installations of Fedora will have their BLS snippets' rootflags=subvol=$ROOT entry (wrongly) rewritten.

Anaconda calls grub2-mkconfig here which unconditionally steps on /boot/loader/entries/*.conf files even if my anaconda RP Don't remove existing BLS entries were accepted.

@chrismurphy does that GRUB bug also affect Silverblue or does Silverblue have its own workaround?

I'm not sure, I'd have to test it. There are some differences where Silverblue stores BLS snippets so it's possible it's not affected.

Looks like the Kubuntu Focus folks wrote a graphical tool for this for their laptops: https://github.com/kfocus/kfocus-source/tree/NN-2024-Q3/package-rollback

It might be interesting to look into this.

There's a new "snapm" tool from @bmr that we should look into.

We've discussed this with @bmr recently, and it looks like we're going to make some progress on making this a reality within the next cycle or so.

I started sketching something out based on the previous feature proposal - the current draft is here:

https://gist.github.com/bmr-cymru/1f8ff5dc46038d9530fd23dc9b720212

It's a bit rough but it includes the major blocking issues for snapm (btrfs plugin for snapm and event-triggered snapset creation/dnf plugin). Any suggestions/corrections welcome.

We will need a libdnf plugin (unless we can get PackageKit ported to libdnf5) and a libdnf5 plugin. Will we need to adopt a SUSE-style nested layout? Or are we just going to need to change Anaconda to have a /var subvolume in addition to /boot and /home? I guess there's also a question of whether the root user home directory needs to be subvolumed too.

Metadata Update from @ngompa:
- Issue assigned to bmr

4 months ago

I think we can keep it as is, with the /var, /boot, /root and /home changes you mentioned but I'd like to prototype that and get a feel for it before committing. It would also be good if we could make this a bit more proscriptive for other storage layouts at the same time (we currently have a very general statement about considering rollback boundaries and laying things out appropriately, but we leave a lot of the detail to the user right now).

I'll look into the plugin question - I've got an open issue for snapm to work on that. The snapm changes should be fairly straightforward (famous last words...), but I've no experience with dnf5 plugins so far.

Should users expect software installed outside of the system package manager to also be rolled back? If not, /usr/local should be either a mount point or a symlink into /var like in Silverblue. And what are the semantics for /opt?

Another issue is whether automatic snapshot creation will make ENOSPC
more likely. Do the SUSE folks have any data on this?

Another issue is whether automatic snapshot creation will make ENOSPC
more likely. Do the SUSE folks have any data on this?

If we don't control the number of snapshots, it will definitely be a problem. My expectation is that we will restrict retention to only a few snapshots (maybe 5 at most by default). This should seriously mitigate the problem.

@ngompa, would removing them at boot time, when space is low, be better? That way, the user can make use of their storage, but also not encounter an error during usage that boot.

That makes it too complicated. It's easier to just remove them as new snapshots are made.

@ngompa, I know that we've similar problems with not always retaining enough old kernels, though.

The current partition layout includes /etc in the root subvolume. This means that rolling back the root subvolume could delete users, groups, and other local configuration. Is that the intended behavior?

At the moment, yes. Right now we generally roll back everything system-wise.

As we figure out our snapshotting and system rollback system, we'll need to adjust our layout for it.

Log in to comment on this ticket.