#67 Silverblue fails to build in Rawhide
Opened 5 months ago by adamwill. Modified 5 months ago

I just noticed that Silverblue has not built in Rawhide since (it looks like) 2018-12-01. The current failure looks like this (full logs):

verifying the installroot
libgstapp-1.0.so.0, needed by /usr/bin/gnome-help, not found
libgstbase-1.0.so.0, needed by /usr/bin/gnome-help, not found
libgstreamer-1.0.so.0, needed by /usr/bin/gnome-help, not found
libgstpbutils-1.0.so.0, needed by /usr/bin/gnome-help, not found
libgstaudio-1.0.so.0, needed by /usr/bin/gnome-help, not found
libgsttag-1.0.so.0, needed by /usr/bin/gnome-help, not found
libgstvideo-1.0.so.0, needed by /usr/bin/gnome-help, not found
libgstgl-1.0.so.0, needed by /usr/bin/gnome-help, not found
libgstfft-1.0.so.0, needed by /usr/bin/gnome-help, not found
libmetacity.so.1, needed by /usr/bin/metacity, not found
libvorbisfile.so.3, needed by /usr/bin/metacity, not found
libvorbisfile.so.3, needed by /usr/bin/metacity, not found
libvorbisfile.so.3, needed by /usr/bin/canberra-boot, not found
libgstapp-1.0.so.0, needed by /usr/bin/yelp, not found
libgstbase-1.0.so.0, needed by /usr/bin/yelp, not found
libgstreamer-1.0.so.0, needed by /usr/bin/yelp, not found
libgstpbutils-1.0.so.0, needed by /usr/bin/yelp, not found
libgstaudio-1.0.so.0, needed by /usr/bin/yelp, not found
libgsttag-1.0.so.0, needed by /usr/bin/yelp, not found
libgstvideo-1.0.so.0, needed by /usr/bin/yelp, not found
libgstgl-1.0.so.0, needed by /usr/bin/yelp, not found
libgstfft-1.0.so.0, needed by /usr/bin/yelp, not found

These are things that runtime-cleanup.tmpl is removing, I believe. But I think there's a wrinkle here: I think they're not intended to be removed, but are being removed because lorax thinks ${libdir} is lib when it should be lib64. Note this from the logs:

removefrom libvorbis --allbut /usr/lib/libvorbisfile.* /usr/lib/libvorbis.*: removed 12/12 files, 2731kb/2731kb

But the line in runtime-cleanup.tmpl is:

removefrom libvorbis --allbut /usr/${libdir}/libvorbisfile.* /usr/${libdir}/libvorbis.*

Tagging @bcl on this, I'll look into it a bit.


Looking in the pylorax logs, I see:

self.arch.buildarch = i686
self.arch.basearch = i386
self.arch.libdir = lib

so that seems to be the problem here: for some reason, lorax got the wrong buildarch set (basearch and libdir are derived from buildarch). Somehow the build still used x86_64 packages, but then lorax did the stripping wrong...

thanks @adamwill - from that last comment, do we know where the bug is?

Not precisely yet, but I'm working on it :) I'm digging through logs and lorax code to see how it gets set. So far, it looks from runroot.log like Koji does not pass --buildarch to lorax, so I think we hit this chunk of lorax, which looks like it attempts to kinda autodetect the buildarch based on the arch of the first (non-source) anaconda-core package it finds in the repos available to it...that's as far as I've got so far.

So, the lorax command looks like this:

lorax --product=Fedora --version=Rawhide --release=20190107.n.0 '--source=http://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190107.n.0/work/$basearch/repo' '--source=http://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190107.n.0/work/$basearch/comps_repo_Silverblue' --variant=Silverblue --nomacboot --volid=Fedora-SB-ostree-x86_64-rawh --add-template=/mnt/koji/compose/rawhide/Fedora-Rawhide-20190107.n.0/work/x86_64/Silverblue/lorax_templates/ostree-based-installer/lorax-configure-repo.tmpl --add-template=/mnt/koji/compose/rawhide/Fedora-Rawhide-20190107.n.0/work/x86_64/Silverblue/lorax_templates/ostree-based-installer/lorax-embed-repo.tmpl --add-template-var=ostree_install_repo=https://kojipkgs.fedoraproject.org/compose/atomic/repo/ --add-template-var=ostree_update_repo=https://kojipkgs.fedoraproject.org/atomic/repo/ --add-template-var=ostree_osname=fedora-workstation --add-template-var=ostree_oskey=fedora-30-primary --add-template-var=ostree_install_ref=fedora/rawhide/x86_64/silverblue --add-template-var=ostree_update_ref=fedora/rawhide/x86_64/silverblue --logfile=/mnt/koji/compose/rawhide/Fedora-Rawhide-20190107.n.0/logs/x86_64/Silverblue/ostree_installer-4/lorax.log --rootfs-size=8 /mnt/koji/compose/rawhide/Fedora-Rawhide-20190107.n.0/work/x86_64/Silverblue/ostree_installer

From that note the two repos specified:

Looking at those, the second seems to be there solely to provide comps data, its primary.xml is tiny, so the first is the main package repo for the lorax run. If we grab the primary.xml from that repo and examine it, we see that it contains both i686 and x86_64 anaconda-core packages:

[adamw@xps13k tmp]$ grep anaconda-core-30.*rpm 422579daa3eb6c6854a225758c2c653b9af9202c02e5bb0dd4cf7447a65397fd-primary.xml
  <location xml:base="file:///mnt/koji/" href="packages/anaconda/30.13/1.fc30/data/signed/cfc659b9/i686/anaconda-core-30.13-1.fc30.i686.rpm"/>
  <location xml:base="file:///mnt/koji/" href="packages/anaconda/30.13/1.fc30/data/signed/cfc659b9/x86_64/anaconda-core-30.13-1.fc30.x86_64.rpm"/>

So I kinda suspect that this is part of the problem. I'm not sure what changed here, though, and unfortunately we no longer have a Rawhide compose where Silverblue succeeded available for examination, which makes it harder to tell. It could be that this 'work' repo didn't used to include any packages from a different arch, and that changed somehow. Or it could be that the q.available() that lorax's get_buildarch() does used to filter out non-native arch packages and no longer does (that seems unlikely, but it's possible). Or it could I guess be that there always were both i686 and x86_64 packages available but get_buildarch for some reason used to hit the x86_64 one first, but now hits the i686 one first. Or it could be something else I'm not seeing yet...

The work repo for x86_64 has always included multilib packages.

Yeah - so my next discovery is: I think Silverblue builds didn't used to use that repo. That's presumably what's changed. The F29 composes haven't been garbage-collected yet, so I looked at one of those, and look at its runroot.log: the lorax args are different. It's just this:

lorax --product=Fedora --version=29 --release=20181028.n.0 --source=http://kojipkgs.fedoraproject.org/compose/branched/Fedora-29-20181028.n.0/compose/Everything/x86_64/os ...

There's no comps repo, and it uses the Everything x86_64 repo, not the work repo.

So now I'm trying to find the source of that change.

That change is a refactoring in Pungi: instead of waiting for the Everything repo the task can start sooner. This change has been ready for a long time now, but the patches were reverted due to a bug in rpm-ostree/dnf/somewhere that caused issues with source packages. That has been fixed since.

https://pagure.io/pungi/issue/890

Maybe I need to revert the change again :/ However not being able to deal with a repo with multilib packages is definitely a bug in whatever tools fails to use it.

Yeah, I just found that commit. I'm not sure that's the whole story, though, as that change seems to have been put back into the Fedora package on 2018-11-12 and we got a Silverblue in Rawhide as late as 2018-12-01, according to nightlies. It definitely explains the difference between that F29 I looked at and current Rawhide, though.

I can certainly write a patch for lorax to make it handle multilib repos in this case, I really kinda want to spend a bit more time trying to identify what apparently changed on 2018-12-02, though...

So @lsedlar and I have a nice consistent story now: it was that change that broke things, and they broke on 2018-12-02 because, while the pungi change was committed on 2018-11-12, there wasn't a build until 2018-11-26, and that build wasn't installed on the builders until 2018-12-01. So that all adds up.

Fixing get_buildarch is easy to do in a dumb way but seems a bit harder to do in any 'smart' way (at least on half an hour of research), and we both think that the most sensible fix is just to have pungi pass --buildarch explicitly to lorax and avoid running into this questionable autodetection entirely. It already does that for regular installer builds anyway, it seems it was just overlooked for ostree installer builds. @lsedlar is going to do that.

I'll file a lorax bug for the autodetection failure, but perhaps we can simply remove the autodetection entirely and make passing --buildarch mandatory, or something. We'll see what @bcl thinks.

Commit da1ea835 relates to this ticket

thanks @adamwill and @lsedlar - that was excellent investigation!

Login to comment on this ticket.

Metadata
Related Pull Requests
  • #1115 Merged 5 months ago