#26 rpm-ostreed service won't start
Closed a year ago Opened a year ago by jasonbrooks.

Today I installed a pkg w/ rpm-ostree install and then ran rpm-ostree ex livefs because I didn't want to reboot right away. I was notified, as is often the case with package layering, that some packages were replaced, and replace wasn't specified, so I ran rpm-ostree ex livefs --replace. That didn't work, and I don't recall what the specific output was, but I rebooted to find that my layered package wasn't installed.

I rebooted and selected my second/previous(?) boot entry in grub, and the system wouldn't boot at all. I went back to the other boot entry, tried to get my rpm-ostree status and noticed that the service wasn't running. The error message is:

Aug 08 12:31:45 sfjbrooks.localdomain systemd[1]: Starting RPM-OSTree System Management Daemon...
Aug 08 12:31:45 sfjbrooks.localdomain rpm-ostree[4987]: Reading config file '/etc/rpm-ostreed.conf'
Aug 08 12:31:45 sfjbrooks.localdomain rpm-ostree[4987]: error: Couldn't start daemon: Error setting up sysroot: readlinkat: No such file or directory
Aug 08 12:31:45 sfjbrooks.localdomain systemd[1]: rpm-ostreed.service: Main process exited, code=exited, status=1/FAILURE
Aug 08 12:31:45 sfjbrooks.localdomain systemd[1]: rpm-ostreed.service: Failed with result 'exit-code'.
Aug 08 12:31:45 sfjbrooks.localdomain systemd[1]: Failed to start RPM-OSTree System Management Daemon.

@walters suggested I post the result of this command:

# ls -ld /boot/loader*
lrwxrwxrwx. 1 root root    8 Aug  8 07:13 /boot/loader -> loader.0
drwxr-xr-x. 3 root root 4096 Aug  8 07:13 /boot/loader.0

i wonder if this is silverblue specific, or more a bug/issue in RPM-OSTree ?

Hmm. Did you have [Experimental] StageDeployments=true in /etc/rpm-ostreed.conf?

What's the output ofls -ald /ostree/boot.*/*/*/* ?

I do have [Experimental] StageDeployments=true in /etc/rpm-ostreed.conf

$ ls -ald /ostree/boot.*/*/*/*
lrwxrwxrwx. 1 root root 108 Aug  8 12:18 /ostree/boot.0.0/fedora-workstation/4e3b649dce917f9210948b8fd12433f664294c86c0660a09ef70f74cad1fd8a2/0 -> ../../../deploy/fedora-workstation/deploy/d3051fd505ecb778e2a0951f000b37f8b5f80944ac330122cc327bdc8c317a21.0
lrwxrwxrwx. 1 root root 108 Aug  8 12:18 /ostree/boot.0.0/fedora-workstation/4e3b649dce917f9210948b8fd12433f664294c86c0660a09ef70f74cad1fd8a2/1 -> ../../../deploy/fedora-workstation/deploy/d3051fd505ecb778e2a0951f000b37f8b5f80944ac330122cc327bdc8c317a21.1
lrwxrwxrwx. 1 root root 108 Aug  8 07:13 /ostree/boot.0.1/fedora-workstation/321d724b4aaed18afcfa63f1e9d7a7f34a7dc2720a07c892f752bdf08dd9138d/0 -> ../../../deploy/fedora-workstation/deploy/4e7e3d537d9a5758cc5ea248c36eb1124624e8e5d966d64f4f58679f4dc602aa.0
lrwxrwxrwx. 1 root root 108 Aug  8 07:13 /ostree/boot.0.1/fedora-workstation/4e3b649dce917f9210948b8fd12433f664294c86c0660a09ef70f74cad1fd8a2/0 -> ../../../deploy/fedora-workstation/deploy/d3051fd505ecb778e2a0951f000b37f8b5f80944ac330122cc327bdc8c317a21.0
lrwxrwxrwx. 1 root root 108 Aug  8 12:18 /ostree/boot.0/fedora-workstation/4e3b649dce917f9210948b8fd12433f664294c86c0660a09ef70f74cad1fd8a2/0 -> ../../../deploy/fedora-workstation/deploy/d3051fd505ecb778e2a0951f000b37f8b5f80944ac330122cc327bdc8c317a21.0
lrwxrwxrwx. 1 root root 108 Aug  8 12:18 /ostree/boot.0/fedora-workstation/4e3b649dce917f9210948b8fd12433f664294c86c0660a09ef70f74cad1fd8a2/1 -> ../../../deploy/fedora-workstation/deploy/d3051fd505ecb778e2a0951f000b37f8b5f80944ac330122cc327bdc8c317a21.1

I do have [Experimental] StageDeployments=true in /etc/rpm-ostreed.conf

OK, I think this is probably related to https://github.com/ostreedev/ostree/pull/1672 and https://github.com/projectatomic/rpm-ostree/pull/1456

Will I be able to recover from this?

Will I be able to recover from this?

Yeah, almost certainly, but it's a bit predicated on us figuring out exactly what's wrong. It's still not clear to me what's broken - I think it's something in the /ostree/boot.* symlinks but when I try to reproduce this starting from

ostree://fedora-atomic:fedora/28/x86_64/atomic-host
                   Version: 28.20180722.0 (2018-07-23 00:38:05)

I get instead:
Replacing /usr... error: No such metadata object 22e3a432406d2d9df2babffc800081913bc34e108f299a66754a1240041a76f2.commit
which is the expected problem. It might

Anyways...does ostree admin deploy fedora-atomic:fedora/28/x86_64/workstation work? if you reboot into that you should be in the base.

OK yep reproduced from
ostree://fedora-atomic:fedora/28/x86_64/atomic-host Version: 28.20180804.0 (2018-08-04 19:52:51)
Looking...

Anyways...does ostree admin deploy fedora-atomic:fedora/28/x86_64/workstation work? if you reboot into that you should be in the base.

# ostree admin deploy fedora-atomic:fedora/28/x86_64/workstation
error: readlinkat: No such file or directory

OK, I'm still not sure why, but indeed there's an "orphaned" bootloader entry which is pointing at the rollback deployment. For me, just mv /boot/loader/entries/ostree-fedora-atomic-1.conf{,.bak} fixed it.
Before doing this, run cat /proc/cmdline and look at the ostree=/ostree/boot.0/$stateroot/$somechecksum/$v arg - don't rename/delete a bootloader entry that contains that.

At this point let's take this to rpm-ostree upstream; I'll file an issue there.

OK, I'm still not sure why, but indeed there's an "orphaned" bootloader entry which is pointing at the rollback deployment. For me, just mv /boot/loader/entries/ostree-fedora-atomic-1.conf{,.bak} fixed it.

That appears to have done the trick, thanks!

Metadata Update from @otaylor:
- Issue status updated to: Closed (was: Open)

a year ago

Login to comment on this ticket.

Metadata