#2548 Fedora-Rawhide-20210531.n.0 DOOMED
Closed 2 years ago by humaton. Opened 3 years ago by releng.

pungi.global.log

[IMAGE_BUILD     ] [ERROR   ] [FAIL] Image build (variant Cloud, arch *, subvariant Cloud_Base) failed, but going on anyway.
[IMAGE_BUILD     ] [ERROR   ] ImageBuild task failed: 69025314. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/imagebuild-Cloud-Cloud_Base-tar-gz.x86_64.log for more details.
[IMAGE_BUILD     ] [ERROR   ] [FAIL] Image build (variant Cloud, arch *, subvariant Cloud_Base) failed, but going on anyway.
[IMAGE_BUILD     ] [ERROR   ] ImageBuild task failed: 69025330. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/imagebuild-Cloud-Cloud_Base-vagrant-libvirt-vagrant-virtualbox.x86_64.log for more details.
[IMAGE_BUILD     ] [ERROR   ] [FAIL] Image build (variant Labs, arch *, subvariant Scientific) failed, but going on anyway.
[IMAGE_BUILD     ] [ERROR   ] ImageBuild task failed: 69025376. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/imagebuild-Labs-Scientific-vagrant-libvirt-vagrant-virtualbox.x86_64.log for more details.

Variant: Labs, subvariant: Scientific task failed. Pinging maintainers: @scitech

[IMAGE_BUILD     ] [ERROR   ] [FAIL] Image build (variant Labs, arch *, subvariant Python_Classroom) failed, but going on anyway.
[IMAGE_BUILD     ] [ERROR   ] ImageBuild task failed: 69025373. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/imagebuild-Labs-Python_Classroom-vagrant-libvirt-vagrant-virtualbox.x86_64.log for more details.

Variant: Labs, subvariant: Python_Classroom task failed. Pinging maintainers: @python-sig @churchyard

[LIVE_MEDIA      ] [ERROR   ] [FAIL] Live media (variant Labs, arch *, subvariant Astronomy_KDE) failed, but going on anyway.
[LIVE_MEDIA      ] [ERROR   ] Live media task failed: 69025300. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/livemedia-Labs-Astronomy_KDE.x86_64.log for more details.

Variant: Labs, subvariant: Astronomy_KDE task failed. Pinging maintainers: @lupinix

[LIVE_MEDIA      ] [ERROR   ] [FAIL] Live media (variant Labs, arch *, subvariant Scientific_KDE) failed, but going on anyway.
[LIVE_MEDIA      ] [ERROR   ] Live media task failed: 69025306. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/livemedia-Labs-Scientific_KDE.x86_64.log for more details.

Variant: Labs, subvariant: Scientific_KDE task failed. Pinging maintainers: @scitech

[LIVE_MEDIA      ] [ERROR   ] [FAIL] Live media (variant Labs, arch *, subvariant Jam_KDE) failed, but going on anyway.
[LIVE_MEDIA      ] [ERROR   ] Live media task failed: 69025317. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/livemedia-Labs-Jam_KDE.x86_64.log for more details.

Variant: Labs, subvariant: Jam_KDE task failed. Pinging maintainers: @eeickmeyer

[LIVE_MEDIA      ] [ERROR   ] [FAIL] Live media (variant Labs, arch *, subvariant Robotics) failed, but going on anyway.
[LIVE_MEDIA      ] [ERROR   ] Live media task failed: 69025325. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/livemedia-Labs-Robotics.x86_64.log for more details.

Variant: Labs, subvariant: Robotics task failed. Pinging maintainers: @robotics-sig @rmattes

[LIVE_MEDIA      ] [ERROR   ] [FAIL] Live media (variant Spins, arch *, subvariant i3) failed, but going on anyway.
[LIVE_MEDIA      ] [ERROR   ] Live media task failed: 69025360. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/livemedia-Spins-i3.x86_64.log for more details.

Variant: Spins, subvariant: i3 task failed. Pinging maintainers: @defolos @odilhao @x3mboy @jflory7 @nasirhm

[ERROR   ] [FAIL] Image build (variant Container, arch aarch64, subvariant Container_Minimal_Base) failed, but going on anyway.
[IMAGE_BUILD     ] [INFO    ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Minimal_Base) (task id: 69025358)
[ERROR   ] [FAIL] Image build (variant Container, arch x86_64, subvariant Container_Minimal_Base) failed, but going on anyway.
[IMAGE_BUILD     ] [INFO    ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Minimal_Base) (task id: 69025358)
[ERROR   ] [FAIL] Image build (variant Container, arch ppc64le, subvariant Container_Minimal_Base) failed, but going on anyway.
[IMAGE_BUILD     ] [INFO    ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Minimal_Base) (task id: 69025358)
[ERROR   ] [FAIL] Image build (variant Container, arch armhfp, subvariant Container_Minimal_Base) failed, but going on anyway.
[IMAGE_BUILD     ] [INFO    ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Minimal_Base) (task id: 69025358)
[ERROR   ] [FAIL] Image build (variant Container, arch x86_64, subvariant Container_Base) failed, but going on anyway.
[IMAGE_BUILD     ] [INFO    ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Base) (task id: 69025337)
[ERROR   ] [FAIL] Image build (variant Container, arch ppc64le, subvariant Container_Base) failed, but going on anyway.
[IMAGE_BUILD     ] [INFO    ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Base) (task id: 69025337)
[ERROR   ] [FAIL] Image build (variant Container, arch aarch64, subvariant Container_Base) failed, but going on anyway.
[IMAGE_BUILD     ] [INFO    ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Base) (task id: 69025337)
[ERROR   ] [FAIL] Image build (variant Container, arch armhfp, subvariant Container_Base) failed, but going on anyway.
[IMAGE_BUILD     ] [INFO    ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Base) (task id: 69025337)
[IMAGE_BUILD     ] [ERROR   ] [FAIL] Image build (variant Labs, arch *, subvariant Python_Classroom) failed, but going on anyway.
[IMAGE_BUILD     ] [ERROR   ] ImageBuild task failed: 69025379. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/aarch64-armhfp/imagebuild-Labs-Python_Classroom-raw-xz.aarch64-armhfp.log for more details.

Variant: Labs, subvariant: Python_Classroom task failed. Pinging maintainers: @python-sig @churchyard

[IMAGE_BUILD     ] [ERROR   ] [FAIL] Image build (variant Spins, arch *, subvariant KDE) failed, but going on anyway.
[IMAGE_BUILD     ] [ERROR   ] ImageBuild task failed: 69025415. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/aarch64-armhfp/imagebuild-Spins-KDE-raw-xz.aarch64-armhfp.log for more details.

Variant: Spins, subvariant: KDE task failed. Pinging maintainers: @rdieter @svahl

[IMAGE_BUILD     ] [ERROR   ] [FAIL] Image build (variant Spins, arch *, subvariant SoaS) failed, but going on anyway.
[IMAGE_BUILD     ] [ERROR   ] ImageBuild task failed: 69025440. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/aarch64-armhfp/imagebuild-Spins-SoaS-raw-xz.aarch64-armhfp.log for more details.

Variant: Spins, subvariant: SoaS task failed. Pinging maintainers: @chimosky @aperezbios

  • Compose run failed because: - 69025304
[ERROR   ] Compose run failed: ImageBuild task failed: 69025304. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/aarch64-ppc64le-s390x-x86_64/imagebuild-Cloud-Cloud_Base-qcow2-raw-xz.aarch64-ppc64le-s390x-x86_64.log for more details.

Compose Total time: 6:44:32
Compose phase INIT time: 0:00:25.
Compose phase PKGSET time: 0:25:29.
Compose phase WEAVER: FAILED.
Compose phase BUILDINSTALL time: 1:56:52.
Compose phase GATHER time: 2:10:50.
Compose phase OSTREE time: 0:35:18.
Compose phase OSTREE_INSTALLER time: 1:00:06.
Compose phase CREATEREPO time: 0:15:01.
Compose phase CREATEISO time: 0:19:15.
Compose phase REPOCLOSURE time: 0:02:08.
Compose phase IMAGE_BUILD: FAILED.
Compose phase LIVE_MEDIA time: 0:34:04.


So, this is likely fallout from my upgrading builders to f34.

The error on the cloud base / x86_64 image is:

"internal error: process exited while connecting to monitor: 2021-05-31T08:12:57.129210Z qemu-system-x86_64: error: failed to set MSR 0x345 to 0x2000\nqemu-system-x86_64: ../target/i386/kvm.c:2701: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed."}}

No idea what that might mean though. ;(

@crobinso could you look, or pass along to someone who could?

Some info on setup:

rhel8.4 virthost, with nested_virt enabled and fedora 34 vm on it runing oz here to build the cloud image.

https://github.com/kubevirt/kubevirt/issues/5068 suggests this could be qemu too new for the kernel. https://wiki.qemu.org/Features/Migration/Troubleshooting#.28x86.29_failed_to_set_MSR_0xXXXX_to_0xYYYY has some info on this type of error also. Messing with the -cpu arg or possibly the machine version may help?

RHEL bug https://bugzilla.redhat.com/show_bug.cgi?id=1790308 sounds similarish. @bonzini fixed that one but the fixing commits are in f34 qemu. Maybe try changing the cpu model to 'qemu64' but I'm not very familar with oz.

Do you have a link to the full libvirt log file or qemu command line? /var/log/libvirt/qemu/VMNAME.log

The workaround at the libvirt level would be to add <pmu state='off'/> inside <features>. I had never seen this issue, I'll look at it.

That log confused me because it contains -cpu host but I didn't see that anywhere in oz upstream. Fedora has a downstream oz patch to force it though (grep 'host-passthrough')

@bonzini does that need to be set at L1 (fedora34) or L2 (oz launched VM) ?

In L2 (i.e. in oz). For reproducing the bug, what is the L0 kernel and QEMU version?

That log confused me because it contains -cpu host but I didn't see that anywhere in oz upstream. Fedora has a downstream oz patch to force it though (grep 'host-passthrough')

@bonzini does that need to be set at L1 (fedora34) or L2 (oz launched VM) ?

We pulled in this PR: https://github.com/clalancette/oz/pull/283

The oz PR is correct, the bug is in either QEMU or Linux.

In L2 (i.e. in oz). For reproducing the bug, what is the L0 kernel and QEMU version?

L0 is rhel 8.4:

4.18.0-305.el8.x86_64
qemu-kvm-4.2.0-48.module+el8.4.0+10368+630e803b.x86_64

L1 is fedora 34:

5.12.7-300.fc34.x86_64
qemu-kvm-5.2.0-7.fc34.x86_64

What is the QEMU command line on the rhel8.4 machine?

It's a doozy. ;)

qemu       22344  117  3.9 21185040 15619380 ?   Sl   May30 4926:35 /usr/libexec/qemu-kvm -name guest=buildvm-x86-22.iad2.fedoraproject.org,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-37-buildvm-x86-22.iad2./master-key.aes -machine pc-q35-rhel8.2.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off -cpu Cascadelake-Server,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off -m 15360 -overcommit mem-lock=off -smp 6,maxcpus=30,sockets=30,cores=1,threads=1 -uuid 9faf178e-e972-403b-b4e3-04725c4200b3 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=40,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 -device pcie-pci-bridge,id=pci.8,bus=pci.1,addr=0x0 -device pcie-root-port,port=0x17,chassis=9,id=pci.9,bus=pcie.0,addr=0x2.0x7 -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.3,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 -blockdev {"driver":"host_device","filename":"/dev/BuildGuests/buildvm-x86-22.iad2.fedoraproject.org","aio":"threads","node-name":"libvirt-1-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":false,"no-flush":true},"driver":"raw","file":"libvirt-1-storage"} -device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on -netdev tap,fd=42,id=hostnet0,vhost=on,vhostfd=43 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c6:2a:cc,bus=pci.2,addr=0x0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=44,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5904,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x1b -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device i6300esb,id=watchdog0,bus=pci.8,addr=0x1 -watchdog-action reset -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

can we do anything to work around this to get a working compose in the meantime?

We can commit a quick revert to f34 QEMU (upstream commit ea39f9b643959d759b8643b4c11c4cbb3683d0ff if @crobinso wants to beat me to it).

I suppose getting ssh access to the L1 machine is out of question? I understand it might be. If not, my gpg --export-ssh-key Bonzini

I'd prefer not to tweak with production virthosts... let me see if I can get it to happen in staging.

The revert sounds like a good short term plan.

So, composes seem to be working again now. :(

The only change I can see on our side is that I updated all the builders to 5.12.8... could this have been fixed between 5.12.7 and 5.12.8?

@kevin I dont think they started working again. Yes, 05.n.0 did work (dont ask me how, I really have no idea :confused: ) but last night's compose has the same issue but it also failed on kde deps issue (which the pungi failure says)

Cloud image Failure from last night : https://koji.fedoraproject.org/koji/taskinfo?taskID=69483076

The working compose happend to run this cloud image create on a buildhw (ie, bare metal f34).

The working compose happend to run this cloud image create on a buildhw (ie, bare metal f34).

Good catch Kevin.

Sorry I was offline Friday until now. I submitted a build to copr crobinso/qemu-fix-compose with the revert @bonzini mentioned, x86_64 only: https://copr.fedorainfracloud.org/coprs/crobinso/qemu-fix-compose/build/2239139/

If we need it in actual fedora repos I'll do it tomorrow

I'm rebuilding that in our f34-infra koji tags... will update builders later today whenever it's done. Thanks!

Sadly, my build failed on s390x. ;(
https://koji.fedoraproject.org/koji/taskinfo?taskID=69606432
Would another option be to downgrade kernel on the host? or the guests?

@kevin I think you had a successful build after we talked on IRC, are composes working now?

Yes they are. I built your patched f34 qemu and updated all the builders to it. :)

That s390x failure is likely to be transient.

FYI I submitted an f34 update with the revert, qemu-5.2.0-8.fc34. I didn't add the patch to f35, hopefully the root cause is identified and fixed in the next 6 months :)

Metadata Update from @humaton:
- Issue status updated to: Closed (was: Open)

2 years ago

Log in to comment on this ticket.

Metadata