#7598 Downgrade to qemu-2.12.0-4.fc29 and libfdt-1.4.6-7.fc29.ppc64le on ppc64le Imagebuilders
Closed: Fixed 5 years ago by kevin. Opened 5 years ago by sinnykumari.

  • Describe what you need us to do:
    In order to succeed image build, we need to try downgrading qemu and libfdt to previous versions qemu-2.12.0-4.fc29 and libfdt-1.4.6-7.fc29.ppc64le on ppc64le Imagebuilders (buildvm-ppc64le-01/02)

More details in https://pagure.io/dusty/failed-composes/issue/1523#comment-556753

  • When do you need this? (YYYY/MM/DD)
    ASAP
  • When is this no longer needed or useful? (YYYY/MM/DD)
    Always needed
  • If we cannot complete your request, what is the impact?
    ppc64le image builds will fail and will block FAH rleases

Metadata Update from @smooge:
- Issue assigned to smooge

5 years ago

Those packages are no longer available. I will have to find them somewhere

Sinny we never have qemu-2.12.0 in fc29 as far as I can tell. the released version was 3.0.0-1 This looks like it was in Fedora 28 which would need a reinstall of the server.

In the past @kevin managed to downgrade to qemu-2.12.0-4.fc29 on F29 ppc64le imagebuilder. There is a no tagged build, but I see qemu-2.12.0-4.fc29 build available in koji https://koji.fedoraproject.org/koji/buildinfo?buildID=1110025 . Maybe, Kevin used same. Dunno how, but probably we can tag that build to f29-infra tag and downgrade it on required builders

If I tag it into the f29-infra tag it will get installed on all systems. So Kevin has a different method of doing this, and I will work with him to get it done today.

If I tag it into the f29-infra tag it will get installed on all systems.

Not if f29 or f29-updates have higher EVR. Infra repos don't override Fedora repos, EVR is still taken into account.

Sorry I envisioned it having a higher epoch in it so that it would upgrade to that one. If it is lower EVR and we have a lock version on the systems then it would not upgrade elsewhere.

@smooge Higher epoch in package would indeed make the infra version get priority.

Thank you all for looking at this issue. Once we have these packages available in ppc64le imagebuilders, can we do another fedora-29-updates compose run to see how compose go? If it needs a releng ticket, let me know and I will create one.

I have downgraded qemu and the following items are now unable to upgrade in the future because they seem to require something not available:

 imagefactory-1.1.9-9.fc29.noarch
 imagefactory-plugins-ovfcommon-1.1.9-8.fc29.noarch
 libguestfs-1:1.39.8-1.fc29.ppc64le
oz-0.16.0-5.fc29.noarch
oz-0.16.0-6.fc29.noarch
oz-0.16.0-7.fc29.noarch

Systems have been restarted so please test.

Thanks @smooge ! Can you please confirm that builder has libfdt version 1.4.6-7.fc29.ppc64le ? Latest lifdt 1.4.7* doesn't seem to work.

Right now they have lifdt 1.4.7 installed.

buildvm-ppc64le-02.ppc.fedoraproject.org | CHANGED | rc=0 >>
qemu-2.12.0-4.fc29.ppc64le
libfdt-1.4.7-2.fc29.ppc64le

buildvm-ppc64le-01.ppc.fedoraproject.org | CHANGED | rc=0 >>
qemu-2.12.0-4.fc29.ppc64le
libfdt-1.4.7-2.fc29.ppc64le

@mizdebsk Thanks! Can we please downgrade to 1.4.6-7.fc29.ppc64le because it has caused issue to me locally and proceeded only when downgraded to 1.4.6-7.fc29.ppc64le .

I'll let @smooge downgrade it. We have Koji maintenance scheduled for today and I don't want to interfere with that.

All packages have been downgraded to the ones above. I have also installed dnf-versionlock on these two systems and locked these rpms so they will not be updated until that lock is removed.

Will keep this open to make sure working imagebuild is working fine for ppc64le

Thanks all! Fedora 29 updates and updates-tetsing compose looks good on ppc64le. We have successful compose for both.

Please note that F30+ imagebuild might still fail because they will hit another issue related to nest-virt on ppc64le https://bugzilla.redhat.com/show_bug.cgi?id=1676475 . So far there is no known fixes I am aware of other than moving builders to P9.

Closing this issue

Metadata Update from @sinnykumari:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

5 years ago

I looked earlier at compose status which was showing FINISHED and hence I thought everything finished successfully.
I was looking at AMI upload status today and noticed that no AMI upload occurred from past 2 days which made we look at F29 updates and updates-testing compose artifacts. It seems ostree and image artifacts run was disabled 2 days back for bodhi updates run https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=98e00004c50f2d9273fbbae27a1ba58aa349f8df .
Reopening this issue because we don't know yet if this issue is fixed.

@mohanboddu Any specific reason for disabling ostree and artifacts during bodhi updates run? When can we expect to get this back to normal? This will block our FAH TWo Week release which we were planning to do this week.

Metadata Update from @sinnykumari:
- Issue status updated to: Open (was: Closed)

5 years ago

Now, ostree runroot task is failing as well on ppc64le imagebuilder

DEBUG util.py:643:  Executing command: ['/usr/bin/dnf', '--installroot', '/var/lib/mock/f29-build-15408747-1108982/root/', '--setopt=install_weak_deps=0', 'install', 'pungi', 'lorax', 'ostree', '--setopt=tsflags=nocontexts'] with env {'TERM': 'vt100', 'SHELL': '/bin/bash', 'HOME': '/builddir', 'HOSTNAME': 'mock', 'PATH': '/usr/bin:/bin:/usr/sbin:/sbin', 'PROMPT_COMMAND': 'printf "\\033]0;<mock-chroot>\\007"', 'PS1': '<mock-chroot> \\s-\\v\\$ ', 'LANG': 'C.UTF-8', 'LC_MESSAGES': 'C.UTF-8', 'LD_PRELOAD': '/var/tmp/tmp.mock.7lipgh2a/$LIB/nosync.so'} and shell False
DEBUG util.py:556:  Last metadata expiration check: 0:00:01 ago on Mon Mar  4 15:39:21 2019.
DEBUG util.py:554:  BUILDSTDERR: Error: 
DEBUG util.py:554:  BUILDSTDERR:  Problem: package libguestfs-tools-c-1:1.40.1-2.fc29.ppc64le requires libguestfs.so.0()(64bit), but none of the providers can be installed
DEBUG util.py:554:  BUILDSTDERR:   - package libguestfs-tools-c-1:1.40.1-2.fc29.ppc64le requires libguestfs(ppc-64) = 1:1.40.1-2.fc29, but none of the providers can be installed
DEBUG util.py:554:  BUILDSTDERR:   - package libguestfs-1:1.40.1-2.fc29.ppc64le requires libvirt-daemon-qemu >= 0.10.2-3, but none of the providers can be installed
DEBUG util.py:554:  BUILDSTDERR:   - package pungi-4.1.32-3.fc29.noarch requires libguestfs-tools-c, but none of the providers can be installed
DEBUG util.py:554:  BUILDSTDERR:   - package libvirt-daemon-qemu-4.7.0-1.fc29.ppc64le requires qemu, but none of the providers can be installed
DEBUG util.py:554:  BUILDSTDERR:   - conflicting requests
DEBUG util.py:554:  BUILDSTDERR:   - package qemu-2:3.0.0-3.fc29.ppc64le is excluded
DEBUG util.py:556:  (try to add '--skip-broken' to skip uninstallable packages)
DEBUG util.py:698:  Child return code was: 1

@smooge Did any package got removed unintentionally on ppc64le imagebuilder while downgrading qemu?

Nothing got removed. Those items are requiring the broken qemu to be installed and whatever ones worked before were long ago updated over.

As per IRC. The problem was that the versionlock also works inside of the imagebuilder causing that not to work. I removed the versionlock and disabled the dnf-automatic from updating.

Finally F29 AH cloud image seems to work fine locally on ppc64le box. It seems kernel-4.20* is playing role with last issue in which imagefactory gets stuck forever during image customization process. We will need to get rid of any 4.20* kernel installed on ppc64le image builders.
What I did locally was:

# rpm -qa kernel*
kernel-modules-4.19.4-300.fc29.ppc64le
kernel-modules-4.20.13-200.fc29.ppc64le
kernel-4.19.4-300.fc29.ppc64le
kernel-4.20.13-200.fc29.ppc64le
kernel-core-4.19.4-300.fc29.ppc64le
kernel-core-4.20.13-200.fc29.ppc64le

# rpm -e kernel-modules-4.20.13-200.fc29.ppc64le kernel-4.20.13-200.fc29.ppc64le kernel-core-4.20.13-200.fc29.ppc64le

# reboot

Note that just booting older kernel doesn't help if kernel 4.20* is still installed in system.

Tested on two different F29 system, works with both kernel 4.18.16-300.fc29.ppc64le and 4.19.4-300.fc29.ppc64le

@kevin @smooge or anyone who has access : can you please remove 4.20* kernels from ppc64le imagebuilder?

Does it have to be 4.19.4-300? Or does any 4.19.x kernel work?

I must say all these downgrades on ppc64le are troubling... I sure hope we can get things fixed so they don't need special downgrades.

Sinny is AFK for today. I don't know if that exact version is required but if we can get that version I'd prefer it because that was what she tested.

ok done and seems to work for ppc64le. Do we also need this for ppc64?

(Or for that matter should we stop making fedora 28 cloud images now to avoid that?)

ok done and seems to work for ppc64le. Do we also need this for ppc64?

we only use ppc64le for atomic host

(Or for that matter should we stop making fedora 28 cloud images now to avoid that?)

I'm happy with stopping all F28 cloud image production unless someone else needs it. We already don't produce f28 atomic host images any longer because we aren't doing releases for them (we do create the OSTrees though, which is a subtle difference)

Yeah, right now we have:

https://koji.fedoraproject.org/koji/taskinfo?taskID=33150092
(f28 cloud base image)
that has ppc64 stuck the same way ppc64le seemed to be...

so, I'm +1 to disabling it, but not sure who we should run it by... server working group?

f28 cloud base image would be the cloud WG for now (although I think there are plans to merge us into one). I can speak for the cloud WG in this case. We don't use them for anything. I think QA or some fedora engineers may use them for something, but maybe this is a shoot first ask questions later scenario.

Does it have to be 4.19.4-300? Or does any 4.19.x kernel work?

Having any kernel < 4.20 should work. So, yes 4.19.x . 4.18.x will work.

I must say all these downgrades on ppc64le are troubling... I sure hope we can get things fixed so they don't need special downgrades.

Yes, it is very troubling and painful but it seems upstream latest development are mostly happening on Power9 hardware. So, we will have to deal with it until builders migrate to Power9 :/

ok done and seems to work for ppc64le. Do we also need this for ppc64?
(Or for that matter should we stop making fedora 28 cloud images now to avoid that?)

Did we already try to run a F29 AH cloud image run on ppc64le ? If yes, can I have the link for it if available somewhere?

We have successful ppc64le AH cloud images in Fedora-29-updates-20190306.0 .

Thanks @kevin @smooge

Metadata Update from @sinnykumari:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

5 years ago

@sinnykumari - do you know if anyone is working on a fix so that we don't need to carry the older versions of packages ? Is https://bugzilla.redhat.com/show_bug.cgi?id=1676475 the right bug to follow?

@sinnykumari - do you know if anyone is working on a fix so that we don't need to carry the older versions of packages ? Is https://bugzilla.redhat.com/show_bug.cgi?id=1676475 the right bug to follow?

I am not aware if someone is working on this issue. I won't be surprised if this issue would be a low priority for getting fixed because it's Power8 specific issue while I think latest development focus is Power9.

@sharkcz thoughts?

I don't think anyone is actively working on KVM PR for nested virt on Power8. Deploying Power9 builders via ticket https://pagure.io/fedora-infrastructure/issue/7475 is the way to go. Last time I checked they were blocked on getting proper "RHEL for Power9" subscription.

We hit issues against recently where ppc64le images started failing.

Note that just booting older kernel doesn't help if kernel 4.20* is still installed in system.

This was the problem again this time. We had two kernels installed on the system even though we were booted to the older one the newer kernel being installed caused an issue. @kevin uninstalled the newer kernel.

There is an actual bug somewhere here though. I think @kevin said it was because the guestfs tools are using information from the newest kernel and not necessarily the booted kernel. Is there an open bug somewhere to track this?

This issue is currently closed. Are you wanting to re-open it?

This issue is currently closed. Are you wanting to re-open it?

Not exactly. The immediate issue (ppc64le atomic host images building) is taken care of. But the underlying issue (multiple kernels causing images to fail building) still exists unless we actively prevent it from happening. It seems like there should be a BZ somewhere filed against the component that is not properly choosing the booted kernel.

It's going to take a bit more debugging to determine where to file that. All I know now is:

oz calls guestfs toward the end of it's work.
when newer kernels are installed on ppc64le that guestfs process hangs.

I don't know if oz is doing something wrong, or guestfs is. It also seems ppc64le specific.

@sinnykumari can you figure out which one is to blame and file a bug?
I could also try and do it, but not sure when I would have time to get to it.

It's going to take a bit more debugging to determine where to file that. All I know now is:
oz calls guestfs toward the end of it's work.
when newer kernels are installed on ppc64le that guestfs process hangs.
I don't know if oz is doing something wrong, or guestfs is. It also seems ppc64le specific.
@sinnykumari can you figure out which one is to blame and file a bug?
I could also try and do it, but not sure when I would have time to get to it.

will dig in more into how oz calls stuff. Meanwhile, let's make sure that we don't update ppc64le image builders until we either know fixes or move to P9.

Metadata Update from @sinnykumari:
- Issue status updated to: Open (was: Closed)

5 years ago

F29 AH ppc64le imagebuild are failing again for both F29-updates and F29-updates-testing. Looks like qemu got upgraded on ppc64e imagebuilder, I see qemu-3.0.0-4.fc29 installed on buildvm-ppc64le-02.ppc.fedoraproject.org . Checked this by running rbac-playbook manual/get-system-packages.yml -l buildvm-ppc64le-02.ppc.fedoraproject.org

I have moved the image builders to a power9 box and got that working, so we don't need to worry about this anymore (thank goodness!).

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

5 years ago

Yay! Thanks Kevin, awesome news.

Login to comment on this ticket.

Metadata