pungi.global.log
[IMAGE_BUILD ] [ERROR ] [FAIL] Image build (variant Cloud, arch *, subvariant Cloud_Base) failed, but going on anyway. [IMAGE_BUILD ] [ERROR ] ImageBuild task failed: 69025314. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/imagebuild-Cloud-Cloud_Base-tar-gz.x86_64.log for more details.
[IMAGE_BUILD ] [ERROR ] [FAIL] Image build (variant Cloud, arch *, subvariant Cloud_Base) failed, but going on anyway. [IMAGE_BUILD ] [ERROR ] ImageBuild task failed: 69025330. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/imagebuild-Cloud-Cloud_Base-vagrant-libvirt-vagrant-virtualbox.x86_64.log for more details.
[IMAGE_BUILD ] [ERROR ] [FAIL] Image build (variant Labs, arch *, subvariant Scientific) failed, but going on anyway. [IMAGE_BUILD ] [ERROR ] ImageBuild task failed: 69025376. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/imagebuild-Labs-Scientific-vagrant-libvirt-vagrant-virtualbox.x86_64.log for more details.
Variant: Labs, subvariant: Scientific task failed. Pinging maintainers: @scitech
[IMAGE_BUILD ] [ERROR ] [FAIL] Image build (variant Labs, arch *, subvariant Python_Classroom) failed, but going on anyway. [IMAGE_BUILD ] [ERROR ] ImageBuild task failed: 69025373. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/imagebuild-Labs-Python_Classroom-vagrant-libvirt-vagrant-virtualbox.x86_64.log for more details.
Variant: Labs, subvariant: Python_Classroom task failed. Pinging maintainers: @python-sig @churchyard
[LIVE_MEDIA ] [ERROR ] [FAIL] Live media (variant Labs, arch *, subvariant Astronomy_KDE) failed, but going on anyway. [LIVE_MEDIA ] [ERROR ] Live media task failed: 69025300. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/livemedia-Labs-Astronomy_KDE.x86_64.log for more details.
Variant: Labs, subvariant: Astronomy_KDE task failed. Pinging maintainers: @lupinix
[LIVE_MEDIA ] [ERROR ] [FAIL] Live media (variant Labs, arch *, subvariant Scientific_KDE) failed, but going on anyway. [LIVE_MEDIA ] [ERROR ] Live media task failed: 69025306. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/livemedia-Labs-Scientific_KDE.x86_64.log for more details.
Variant: Labs, subvariant: Scientific_KDE task failed. Pinging maintainers: @scitech
[LIVE_MEDIA ] [ERROR ] [FAIL] Live media (variant Labs, arch *, subvariant Jam_KDE) failed, but going on anyway. [LIVE_MEDIA ] [ERROR ] Live media task failed: 69025317. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/livemedia-Labs-Jam_KDE.x86_64.log for more details.
Variant: Labs, subvariant: Jam_KDE task failed. Pinging maintainers: @eeickmeyer
[LIVE_MEDIA ] [ERROR ] [FAIL] Live media (variant Labs, arch *, subvariant Robotics) failed, but going on anyway. [LIVE_MEDIA ] [ERROR ] Live media task failed: 69025325. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/livemedia-Labs-Robotics.x86_64.log for more details.
Variant: Labs, subvariant: Robotics task failed. Pinging maintainers: @robotics-sig @rmattes
[LIVE_MEDIA ] [ERROR ] [FAIL] Live media (variant Spins, arch *, subvariant i3) failed, but going on anyway. [LIVE_MEDIA ] [ERROR ] Live media task failed: 69025360. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/x86_64/livemedia-Spins-i3.x86_64.log for more details.
Variant: Spins, subvariant: i3 task failed. Pinging maintainers: @defolos @odilhao @x3mboy @jflory7 @nasirhm
[ERROR ] [FAIL] Image build (variant Container, arch aarch64, subvariant Container_Minimal_Base) failed, but going on anyway. [IMAGE_BUILD ] [INFO ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Minimal_Base) (task id: 69025358)
[ERROR ] [FAIL] Image build (variant Container, arch x86_64, subvariant Container_Minimal_Base) failed, but going on anyway. [IMAGE_BUILD ] [INFO ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Minimal_Base) (task id: 69025358)
[ERROR ] [FAIL] Image build (variant Container, arch ppc64le, subvariant Container_Minimal_Base) failed, but going on anyway. [IMAGE_BUILD ] [INFO ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Minimal_Base) (task id: 69025358)
[ERROR ] [FAIL] Image build (variant Container, arch armhfp, subvariant Container_Minimal_Base) failed, but going on anyway. [IMAGE_BUILD ] [INFO ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Minimal_Base) (task id: 69025358)
[ERROR ] [FAIL] Image build (variant Container, arch x86_64, subvariant Container_Base) failed, but going on anyway. [IMAGE_BUILD ] [INFO ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Base) (task id: 69025337)
[ERROR ] [FAIL] Image build (variant Container, arch ppc64le, subvariant Container_Base) failed, but going on anyway. [IMAGE_BUILD ] [INFO ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Base) (task id: 69025337)
[ERROR ] [FAIL] Image build (variant Container, arch aarch64, subvariant Container_Base) failed, but going on anyway. [IMAGE_BUILD ] [INFO ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Base) (task id: 69025337)
[ERROR ] [FAIL] Image build (variant Container, arch armhfp, subvariant Container_Base) failed, but going on anyway. [IMAGE_BUILD ] [INFO ] [DONE ] Creating image (formats: docker, arches: aarch64-armhfp-ppc64le-s390x-x86_64, variant: Container, subvariant: Container_Base) (task id: 69025337)
[IMAGE_BUILD ] [ERROR ] [FAIL] Image build (variant Labs, arch *, subvariant Python_Classroom) failed, but going on anyway. [IMAGE_BUILD ] [ERROR ] ImageBuild task failed: 69025379. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/aarch64-armhfp/imagebuild-Labs-Python_Classroom-raw-xz.aarch64-armhfp.log for more details.
[IMAGE_BUILD ] [ERROR ] [FAIL] Image build (variant Spins, arch *, subvariant KDE) failed, but going on anyway. [IMAGE_BUILD ] [ERROR ] ImageBuild task failed: 69025415. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/aarch64-armhfp/imagebuild-Spins-KDE-raw-xz.aarch64-armhfp.log for more details.
Variant: Spins, subvariant: KDE task failed. Pinging maintainers: @rdieter @svahl
[IMAGE_BUILD ] [ERROR ] [FAIL] Image build (variant Spins, arch *, subvariant SoaS) failed, but going on anyway. [IMAGE_BUILD ] [ERROR ] ImageBuild task failed: 69025440. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/aarch64-armhfp/imagebuild-Spins-SoaS-raw-xz.aarch64-armhfp.log for more details.
Variant: Spins, subvariant: SoaS task failed. Pinging maintainers: @chimosky @aperezbios
[ERROR ] Compose run failed: ImageBuild task failed: 69025304. See /mnt/koji/compose/rawhide/Fedora-Rawhide-20210531.n.0/logs/aarch64-ppc64le-s390x-x86_64/imagebuild-Cloud-Cloud_Base-qcow2-raw-xz.aarch64-ppc64le-s390x-x86_64.log for more details.
Compose Total time: 6:44:32 Compose phase INIT time: 0:00:25. Compose phase PKGSET time: 0:25:29. Compose phase WEAVER: FAILED. Compose phase BUILDINSTALL time: 1:56:52. Compose phase GATHER time: 2:10:50. Compose phase OSTREE time: 0:35:18. Compose phase OSTREE_INSTALLER time: 1:00:06. Compose phase CREATEREPO time: 0:15:01. Compose phase CREATEISO time: 0:19:15. Compose phase REPOCLOSURE time: 0:02:08. Compose phase IMAGE_BUILD: FAILED. Compose phase LIVE_MEDIA time: 0:34:04.
So, this is likely fallout from my upgrading builders to f34.
The error on the cloud base / x86_64 image is:
"internal error: process exited while connecting to monitor: 2021-05-31T08:12:57.129210Z qemu-system-x86_64: error: failed to set MSR 0x345 to 0x2000\nqemu-system-x86_64: ../target/i386/kvm.c:2701: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed."}}
No idea what that might mean though. ;(
@crobinso could you look, or pass along to someone who could?
Some info on setup:
rhel8.4 virthost, with nested_virt enabled and fedora 34 vm on it runing oz here to build the cloud image.
https://github.com/kubevirt/kubevirt/issues/5068 suggests this could be qemu too new for the kernel. https://wiki.qemu.org/Features/Migration/Troubleshooting#.28x86.29_failed_to_set_MSR_0xXXXX_to_0xYYYY has some info on this type of error also. Messing with the -cpu arg or possibly the machine version may help?
-cpu
@bonzini may be able to help.
RHEL bug https://bugzilla.redhat.com/show_bug.cgi?id=1790308 sounds similarish. @bonzini fixed that one but the fixing commits are in f34 qemu. Maybe try changing the cpu model to 'qemu64' but I'm not very familar with oz.
Do you have a link to the full libvirt log file or qemu command line? /var/log/libvirt/qemu/VMNAME.log
Here's the log file...
<img alt="factory-build-ed3bc288-1615-4b41-a62d-5a659a0ef8c4.log" src="/releng/failed-composes/issue/raw/files/dbe0349dc3a9326a020ba0a91e9bdea79215260b93c775795bdaba4ea0036de7-factory-build-ed3bc288-1615-4b41-a62d-5a659a0ef8c4.log" />
The workaround at the libvirt level would be to add <pmu state='off'/> inside <features>. I had never seen this issue, I'll look at it.
That log confused me because it contains -cpu host but I didn't see that anywhere in oz upstream. Fedora has a downstream oz patch to force it though (grep 'host-passthrough')
-cpu host
@bonzini does that need to be set at L1 (fedora34) or L2 (oz launched VM) ?
In L2 (i.e. in oz). For reproducing the bug, what is the L0 kernel and QEMU version?
That log confused me because it contains -cpu host but I didn't see that anywhere in oz upstream. Fedora has a downstream oz patch to force it though (grep 'host-passthrough') @bonzini does that need to be set at L1 (fedora34) or L2 (oz launched VM) ?
We pulled in this PR: https://github.com/clalancette/oz/pull/283
The oz PR is correct, the bug is in either QEMU or Linux.
L0 is rhel 8.4:
4.18.0-305.el8.x86_64 qemu-kvm-4.2.0-48.module+el8.4.0+10368+630e803b.x86_64
L1 is fedora 34:
5.12.7-300.fc34.x86_64 qemu-kvm-5.2.0-7.fc34.x86_64
What is the QEMU command line on the rhel8.4 machine?
It's a doozy. ;)
qemu 22344 117 3.9 21185040 15619380 ? Sl May30 4926:35 /usr/libexec/qemu-kvm -name guest=buildvm-x86-22.iad2.fedoraproject.org,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-37-buildvm-x86-22.iad2./master-key.aes -machine pc-q35-rhel8.2.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off -cpu Cascadelake-Server,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off -m 15360 -overcommit mem-lock=off -smp 6,maxcpus=30,sockets=30,cores=1,threads=1 -uuid 9faf178e-e972-403b-b4e3-04725c4200b3 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=40,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 -device pcie-pci-bridge,id=pci.8,bus=pci.1,addr=0x0 -device pcie-root-port,port=0x17,chassis=9,id=pci.9,bus=pcie.0,addr=0x2.0x7 -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.3,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 -blockdev {"driver":"host_device","filename":"/dev/BuildGuests/buildvm-x86-22.iad2.fedoraproject.org","aio":"threads","node-name":"libvirt-1-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":false,"no-flush":true},"driver":"raw","file":"libvirt-1-storage"} -device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on -netdev tap,fd=42,id=hostnet0,vhost=on,vhostfd=43 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c6:2a:cc,bus=pci.2,addr=0x0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=44,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5904,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x1b -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device i6300esb,id=watchdog0,bus=pci.8,addr=0x1 -watchdog-action reset -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
can we do anything to work around this to get a working compose in the meantime?
We can commit a quick revert to f34 QEMU (upstream commit ea39f9b643959d759b8643b4c11c4cbb3683d0ff if @crobinso wants to beat me to it).
I suppose getting ssh access to the L1 machine is out of question? I understand it might be. If not, my gpg --export-ssh-key Bonzini
gpg --export-ssh-key Bonzini
I'd prefer not to tweak with production virthosts... let me see if I can get it to happen in staging.
The revert sounds like a good short term plan.
So, composes seem to be working again now. :(
The only change I can see on our side is that I updated all the builders to 5.12.8... could this have been fixed between 5.12.7 and 5.12.8?
@kevin I dont think they started working again. Yes, 05.n.0 did work (dont ask me how, I really have no idea :confused: ) but last night's compose has the same issue but it also failed on kde deps issue (which the pungi failure says)
Cloud image Failure from last night : https://koji.fedoraproject.org/koji/taskinfo?taskID=69483076
The working compose happend to run this cloud image create on a buildhw (ie, bare metal f34).
Good catch Kevin.
Sorry I was offline Friday until now. I submitted a build to copr crobinso/qemu-fix-compose with the revert @bonzini mentioned, x86_64 only: https://copr.fedorainfracloud.org/coprs/crobinso/qemu-fix-compose/build/2239139/
crobinso/qemu-fix-compose
If we need it in actual fedora repos I'll do it tomorrow
I'm rebuilding that in our f34-infra koji tags... will update builders later today whenever it's done. Thanks!
Sadly, my build failed on s390x. ;( https://koji.fedoraproject.org/koji/taskinfo?taskID=69606432 Would another option be to downgrade kernel on the host? or the guests?
@kevin I think you had a successful build after we talked on IRC, are composes working now?
Yes they are. I built your patched f34 qemu and updated all the builders to it. :)
That s390x failure is likely to be transient.
FYI I submitted an f34 update with the revert, qemu-5.2.0-8.fc34. I didn't add the patch to f35, hopefully the root cause is identified and fixed in the next 6 months :)
Metadata Update from @humaton: - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.