#331 Fedora-34 nightly images in AWS do not boot
Closed: fixed 2 years ago by ngompa. Opened 2 years ago by kevin.

Orig issue: https://pagure.io/fedora-infrastructure/issue/9952

Hi,

we try to follow latest nightly in Fedora CI, recent F34 does not boot though, in console we see

https://ibb.co/bLqQSMk

we see this also with nightly from this night 20210512

There it hangs ...

CC: @mvadkert

Some more info:

  • f34 GA works fine.
  • All nightlys after GA fail.
  • All of them have been failing in openqa as well.

There may be some config in https://pagure.io/pungi-fedora/blob/f34/f/fedora-cloud.conf thats wrong... but the images are created, there's no error.


+1

the cloud images at https://kojipkgs.fedoraproject.org/compose/cloud/latest-Fedora-Cloud-34/compose/Cloud/x86_64/images/ are not bootable
e.g. virt-manager just hangs at "Booting from Hard Disk..."
problem with the compose?

I would like to raise the priority of this issue. This is blocking linux-system-roles CI testing.

Metadata Update from @ngompa:
- Issue tagged with: AWS

2 years ago
  • Issue tagged with: AWS

Note that the issue isn't just with AWS - it is a general issue with the cloud images on all platforms (e.g. local libvirt/qemu/kvm has the same issue)

Metadata Update from @ngompa:
- Issue tagged with: meeting

2 years ago

In the 2021-05-25 community meeting, @davdunc agreed to investigate:

* Fedora-34 nightly images in AWS do not boot (Eighth_Doctor, 16:33:09)
* LINK: https://pagure.io/cloud-sig/issue/331 (Eighth_Doctor,
16:33:13)
* ACTION: davdunc will investigate the issue with Fedora 34 Cloud
nightly images not booting in AWS (Eighth_Doctor, 16:37:35)

Metadata Update from @ngompa:
- Issue untagged with: meeting

2 years ago

Metadata Update from @ngompa:
- Issue assigned to davdunc

2 years ago

The latest nightly compose images are now working for me

they do not boot in openQA, and have not ever since the first post-F34 release compose.

I did look into the nightlies and the original ones did not boot. I didn't discover a root cause for the images I investigated, but I am able to boot subsequent nightlies as well, @meggins. I think that this can be closed.

Unfortunately, since it doesn't boot in openQA (per @adamwill), we still have a problem...

Pardon my lack of OpenQA knowledge (Can we use the console logs https://docs.aws.amazon.com/cli/latest/reference/ec2/get-console-screenshot.html to provide the detail typically used to scrape and verify the console details in the testing?

openQA doesn't boot the images in ec2 or any other cloud, it attaches them directly to a VM and boots that. Unfortunately the F34 images are failing so early I have no useful info on the failure; they do not even reach grub, qemu just gets stuck at "Booting from Hard Disk..."

Yep, still seeing the same thing as @adamwill on AWS Fedora-Cloud-Base-34-20210604.0.x86_64-hvm-us-east-2-gp2-0 , just tested with , we have to stick with Fedora-34 1.2 release which is making our Fedora CI users a bit unhappy.

@davdunc seems the issue is only with F34 and we still see it in AWS, all versions except F34 boot just fine

@frantisekz this is where we've been discussing the F34 cloud images entirely failing to boot in some circumstances (including in openQA). Frantisek says they do boot in testcloud, and posted the libvirt xml testcloud generates in case it's useful:

<domain type="kvm">
  <name>ftest</name>
  <uuid>3bcdc434-e532-4e33-899e-742289d92728</uuid>
  <memory unit="KiB">786432</memory>
  <currentMemory unit="KiB">786432</currentMemory>
  <vcpu placement="static">1</vcpu>
  <os>
    <type arch="x86_64" machine="pc-i440fx-5.1">hvm</type>
    <boot dev="hd"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <vmport state="off"/>
  </features>
  <cpu mode="host-passthrough" check="none" migratable="on"/>
  <clock offset="utc">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-kvm</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2"/>
      <source file="/var/lib/testcloud/instances/ftest/ftest-local.qcow2"/>
      <target dev="vda" bus="virtio"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x07" function="0x0"/>
    </disk>
    <disk type="file" device="disk">
      <driver name="qemu" type="raw"/>
      <source file="/var/lib/testcloud/instances/ftest/ftest-seed.img"/>
      <target dev="vdb" bus="virtio"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x08" function="0x0"/>
    </disk>
    <controller type="usb" index="0" model="piix3-uhci">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x2"/>
    </controller>
    <controller type="pci" index="0" model="pci-root"/>
    <interface type="network">
      <mac address="52:54:00:2d:9d:79"/>
      <source network="default"/>
      <model type="virtio"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x0"/>
    </interface>
    <serial type="pty">
      <target type="isa-serial" port="0">
        <model name="isa-serial"/>
      </target>
    </serial>
    <console type="pty">
      <target type="serial" port="0"/>
    </console>
    <input type="keyboard" bus="ps2"/>
    <input type="mouse" bus="ps2"/>
    <memballoon model="virtio">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x09" function="0x0"/>
    </memballoon>
    <rng model="virtio">
      <backend model="random">/dev/urandom</backend>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0"/>
    </rng>
  </devices>
</domain>

Fedora-Cloud-Base-34-20210620.0.x86_64.raw was created in a UEFI VM, and has a GPT + ESP + UEFI GRUB installed on it. So it won't work in a BIOS VM, which is what openqa is testing with, hence the failure at seabios (and I assume AWS).

From the log:
https://kojipkgs.fedoraproject.org//packages/Fedora-Cloud-Base/34/20210620.0/data/logs/image/oz-x86_64.log

08:19:44,710 NOTICE root:20microsoft: debug: Skipping legacy bootloaders on UEFI system

and various other things that show the installer definitely thinks it's working in a UEFI VM, which it shouldn't be. So yeah I guess pungi dictates whether it's in a UEFI or BIOS VM?

f33 and rawhide x86_64 images are MBR/BIOS and are passing openqa tests.

Thanks for looking into it, @chrismurphy ! that sure does sound like the problem (and as a bonus explains why it's working for some people - it'll work if you boot it in an environment with a UEFI firmware...)

We discussed this at the community meeting today:

  * ACTION: Eighth_Doctor to submit PR to pungi-fedora to fix BIOS
    booting issues for f34 cloud nightlights  (dustymabe, 16:39:26)

Metadata Update from @ngompa:
- Issue assigned to ngompa (was: davdunc)

2 years ago

@kevin, @mvadkert: The next Fedora 34 nightly should work properly. Can you test it and see if it works for you?

It's working in openQA now at least. All tests are green since Fedora-Cloud-34-20210623.0 . Thanks.

Also working now for system roles CI testing

Then let's call it gravy!

Metadata Update from @ngompa:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata
Related Pull Requests
  • #1035 Merged 2 years ago