#12 Integrated backup and restore
Opened 2 years ago by chrismurphy. Modified 4 months ago

@sgallagh and I briefly discussed how to provide "a properly-integrated and easy to configure backup solution".

  • Fedora Workstation (source) to Fedora Server (destination)
    • Btrfs on the source can be a prerequisite so long as the solution is broadly useful, easy to configure, reliable, and likely to be used.
    • Configuration is likely to be limited to enabled/disabled, choosing the destination to hold the backups, backup frequency and timing, and error handling/notifications.
    • Advantage of destination having Btrfs is, it's easy to restore single files from the backup: sshfs, NFS, SMB, rsync.
    • Bonus: it can be any Fedora to any Fedora. e.g. either one way, or mutual backups.
  • Fedora to any block device
    • Straightforward to depend on Btrfs for both source and destination.
    • Many other solutions in this space, how does this one compete?
    • Unquestionably makes it more "likely to be used" than requiring a 2nd Fedora system.

Summary of btrfs send/receive function:

  • smallest granular unit is the subvolume snapshot (must be a read-only snapshot)
  • btrfs send creates a metadata+data stream of a subvolume snapshot, or the difference between two snapshots
  • btrfs send -p used to produce an incremental stream of the difference between two snapshots.
  • btrfs receive accepts a stream and creates a subvolume snapshot from it; if it's an incremental send, the parent subvolume snapshot must already yexist on the source and destination
    • cheap to compute incremental stream: requires no deep traversal on either the source or destination; e.g. change a single 1MiB file in a 20T filesystem, the stream will be at most 1MiB and take a few seconds to compute without needing to first communicate with the destination.
    • when moving or renaming large numbers of files, the incremental stream functionally contains just 'mv' commands to move/rename them upon being received
  • source file system must be btrfs
  • destination file system needs to be btrfs to receive and create subvolume snapshots; but a non-btrfs file system can accept and store the send stream as a file; a few changes to the backup/restore strategy are indicated. i.e. an incremental send file can't stand alone, whereas a received incremental send processed by btrfs receive creates a complete stand alone subvolume snapshot.

Btrfs send/receive is going to get an overhaul to support btrfs/fscrypt work that's in progress (native encryption using the kernel's fs/crypto API). Possible changes:

  • support both compressed and encrypted streams
  • add checksumming to the data stream
  • bonus: authentication won't be required to backup encrypted subvolumes

Various questions:

  • Is there a role for systemd-homed? User-level backup preferences/configuration isn't accessible in an encrypted /home or ~/ - but the sd-homed might be able to contain this information in the unencrypted user record?
  • Is there a tie in with snapper? Are there possible conflicts? The plus side of cheap snapshots is, they can each create and manage their own snapshots, and put them wherever they want.
  • Rentention policy default? Configurable?
  • What to backup?
    • Just /home or ~/ ?
    • Option to backup everything?
    • Option that shows subvolumes, and make them user selectable?
  • Server side option could include a deduplication agent; e.g. bees.

Project references:

man btrfs subvolume
man btrfs send
man btrfs receive

btrbk - backup tool for btrfs subvolumes

bees - deduplication agent

Metadata Update from @chrismurphy:
- Issue set to the milestone: Future Release
- Issue tagged with: Desktop, Server, Utils

2 years ago

A liability/limitation for this idea is SELinux will prevent btrfs receive from restoring security labels it doesn't recognize.

Example: Fedora 36 has a new security label system_u:object_r:NetworkManager_dispatcher_script_t:s0 which Fedora 35 is not aware of. If a Fedora 35 Server is the destination for backups from a Fedora 36 desktop, it fails by default. While we have a way to suppress the error so receive succeeds, the received subvolume snapshot does not have the "Received UUID" set, which is a piece of metadata needed for incremental send/receive. Thus, while we have the data received, it cannot then be used as a source(or parent) in a subsequent incremental receive, thus breaking incremental receive - which is the feature of send/receive. The incremental computation on btrfs is cheap, no deep traversal is needed on either source or destination.

Relaxing the rule requiring full replication of all data and metadata before Received UUID can be set is a slippery slope. It'd take some evaluation to make sure we don't cause other problem, by allowing this metadata to be dropped while still setting the Received UUID.

Perhaps it's possible to backport these labels to earlier versions of Fedora?

Still another thought is these new security labels tend to only crop up for /usr /etc/ /var not /home. So if the feature were constrained to backing up just user /home, then that might be a suitable work around? Is there a use case for users creating and setting arbitrary security labels?

See also: send|receive ERROR: lsetxattr failed, SELINUX_ERR op=setxattr invalid_context

Login to comment on this ticket.