#38 Install fails in recent Rawhide compose due to missing grub2 bits
Closed 4 years ago Opened 4 years ago by adamwill.

In the recent Rawhide compose, install of the Workstation OStree image fails:

https://openqa.stg.fedoraproject.org/tests/153130

The error is:

12:32:15,608 INF program: Running... grub2-install --no-floppy /dev/vda
12:32:16,071 INF program: Installing for i386-pc platform.
12:32:16,072 INF program: grub2-install: error: cannot rename the file /boot/grub2/grubenv.new to /boot/grub2/grubenv: No such file or directory.
12:32:16,072 DBG program: Return code: 1

this is, I think, because some grub2 bits are missing from whatever environment the bootloader installation steps of anaconda run in, in the Workstation ostree case. For a regular install they run inside the installed system chroot, but ostree seems to be a bit different here - the rpmostreepayload.py payload runs iutil.setSysroot, which changes the value returned by iutil.getSysroot, which the bootloader install command uses as its root:

        rc = iutil.execWithRedirect("grub2-install", grub_args,
                                    root=iutil.getSysroot(),
                                    env_prune=['MALLOC_PERTURB_'])

rpmostreepayload sets it like this:

iutil.setSysroot(deployment_path.get_path())

but I'm not entirely sure what the implications of that are. Anyhow...

What's happened is that the grub2 package has been rejigged, and there is no grub2 package any more. There's a grub2-(something) package on various arches - grub2-pc on x86_64 and i686, grub2-ppc64 on ppc64, etc - that Provides: grub2. However, Provides aren't followed in some parts of the compose process, and that's likely the case here.

So I think grub2-pc is likely missing from the relevant environment. I can't tell where exactly we need to change something without grokking the OStree bits a bit more...


Hey guys, sorry I've been AFK this week. This looks like https://bugzilla.redhat.com/show_bug.cgi?id=1479960 which should have been fixed in grub2-2.02-8.fc27. Maybe this is a similar bug?

This is of interest from https://openqa.stg.fedoraproject.org/tests/155761/file/_do_install_and_reboot-anaconda.log:

13:20:37,681 INF progress: Installing boot loader
13:20:37,681 INF installation: Task started: Install bootloader (17/18)
13:20:37,682 INF bootloader: boot loader stage1 target device is vda1
13:20:37,682 INF bootloader: boot loader stage2 target device is vda2
13:20:38,505 INF bootloader: bootloader.py: used boot args: rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet 
13:20:40,188 DBG exception: running handleException
13:20:40,194 CRT exception: Traceback (most recent call last):

  File "/usr/lib64/python3.6/site-packages/pyanaconda/threading.py", line 252, in run
    threading.Thread.run(self)

  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)

  File "/usr/lib64/python3.6/site-packages/pyanaconda/installation.py", line 365, in doInstall
    installation_queue.start()

  File "/usr/lib64/python3.6/site-packages/pyanaconda/installation_tasks.py", line 304, in start
    item.start()

  File "/usr/lib64/python3.6/site-packages/pyanaconda/installation_tasks.py", line 304, in start
    item.start()

  File "/usr/lib64/python3.6/site-packages/pyanaconda/installation_tasks.py", line 472, in start
    self.run_task()

  File "/usr/lib64/python3.6/site-packages/pyanaconda/installation_tasks.py", line 438, in run_task
    self._task(*self._task_args, **self._task_kwargs)

  File "/usr/lib64/python3.6/site-packages/pyanaconda/bootloader.py", line 2490, in writeBootLoader
    writeBootLoaderFinal(storage, payload, instClass, ksdata)

  File "/usr/lib64/python3.6/site-packages/pyanaconda/bootloader.py", line 2463, in writeBootLoaderFinal
    storage.bootloader.write()

  File "/usr/lib64/python3.6/site-packages/pyanaconda/bootloader.py", line 1769, in write
    self.install()

  File "/usr/lib64/python3.6/site-packages/pyanaconda/bootloader.py", line 1779, in install
    self.add_efi_boot_target()

  File "/usr/lib64/python3.6/site-packages/pyanaconda/bootloader.py", line 1730, in add_efi_boot_target
    self._add_single_efi_boot_target(self.stage1_device)  # pylint: disable=no-member

  File "/usr/lib64/python3.6/site-packages/pyanaconda/bootloader.py", line 1722, in _add_single_efi_boot_target
    "-l", self.efi_dir_as_efifs_dir + self._efi_binary,  # pylint: disable=no-member

TypeError: must be str, not method

What's happened is that the grub2 package has been rejigged, and there is no grub2 package any more. There's a grub2-(something) package on various arches - grub2-pc on x86_64 and i686, grub2-ppc64 on ppc64, etc - that Provides: grub2. However, Provides aren't followed in some parts of the compose process, and that's likely the case here.

So I think grub2-pc is likely missing from the relevant environment. I can't tell where exactly we need to change something without grokking the OStree bits a bit more...

I do notice that from f26 we install grub2-efi grub2-efi-modules grub2 grub2-tools. In Rawhide we install grub2-efi-x64-cdboot grub2-efi-ia32-cdboot grub2-common grub2-tools-efi.

It indeed looks like there's grub2-2.02-6.fc27 whereas we need -8. Hm, but this is rawhide, and it appears to have been built, so why isn't it in the tree?

Ah... https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20170908.n.0/logs/x86_64/ostree/ostree-4/create-ostree-repo.log

error: No package matches 'atomic'

The package list pain is incredible. Which is... https://pagure.io/fedora-comps/pull-request/144

@dustymabe that traceback is a fairly new anaconda bug which I just fixed:

https://bugzilla.redhat.com/show_bug.cgi?id=1489144

but this issue predates that one.

Hmm...so I was just going to drop environment-modules, but first I wanted to reproduce.

I verified can reproduce the atomic being missing failure by pointing a compose at the 20170902 repodata, but the repodata above (20170910) seems to include environment-modules, and things at least depsolve here (local f26 dev container, git master rpm-ostree).

Ah nevermind, it was https://pagure.io/workstation-ostree-config/pull-request/40 confusing me between rawhide vs 27. Reproduced locally with the 27 repodata.

Ah nevermind, it was https://pagure.io/workstation-ostree-config/pull-request/40 confusing me between rawhide vs 27. Reproduced locally with the 27 repodata.

so environment-modules is a problem and you are working on resolving that?

See https://pagure.io/workstation-ostree-config/pull-request/41 (do you get PR notifications for this repo?)

apparently not. - can you make me an admin on this repo? Should we configure notifications to go to the atomic@fp.o mailing list?

@adamwill Is wstree being tested in openqa? If so I can't find it.

@walters sure, when the image is actually present - e.g. https://openqa.fedoraproject.org/tests/overview?distri=fedora&version=27&build=Fedora-27-20170912.n.0&groupid=1 (look for Workstation-dvd_ostree-iso). For the 0914.n.0 composes, both 27 and Rawhide, it seems like the image wasn't present, I don't know why not yet.

It was renamed into a separate "group" or whatever, see https://pagure.io/pungi-fedora/c/e8ea94577611b3d054d1040ad6a379779186728a?branch=master - is that why it dropped out of openqa?

@walters it didn't drop out of openQA (back then), it dropped out of the compose - i.e. images were not appearing in the compose at all. openQA can't test images that don't exist.

The change you linked to, FWIW, only affects the volume ID of the ISO file, which few things care much about (we have to use abbreviations for various things to make sure the volume IDs are below a certain length, IIRC, I can't remember what the length limit is or why it exists, some ISO spec limit or something).

yeah, the commit you linked to wasn't about the variant change, which is why I got confused :)

So as the variant name changed, indeed, things needed updating. I thought I'd fixed this three days ago, but just realized I only fixed half of it - I fixed the subvariant name in the list of 'wanted' images but didn't update the flavor name in the openQA configuration; the flavor names are generated from a formula including the subvariant name, so the scheduler was trying to schedule jobs for a non-existent flavor.

I've now updated the flavor name in the openQA config too; happily openQA seems OK with a flavor name that contains a space, and things actually seem to be working now, the image is really getting tested again. Now we can see if it works. :)

So it looks like install and boot now work, but it crashes from gnome-initial-setup back to GDM. Welp, that's a new bug. So we can probably close this.

https://openqa.stg.fedoraproject.org/tests/180846

Metadata Update from @walters:
- Issue status updated to: Closed (was: Open)

4 years ago

Login to comment on this ticket.

Metadata