#425 wrong layout for ppc64le qcow2 image
Opened 10 months ago by sharkcz. Modified 9 months ago

There is something wrong with the Cloud-Base-Generic qcow2 image on ppc64le in Rawhide (and F-40). It won't boot and fdisk reports a different partition layout compared to a F-39 qcow2 image after converting the qcow2 to a raw image. Without the PReP partition the image can't be booted.

[dan@talos tmp]$ fdisk -l cloud-f41.raw                                                                                                                                                       
GPT PMBR size mismatch (1310719 != 10485759) will be corrected by write.
Disk cloud-f41.raw: 5 GiB, 5368709120 bytes, 10485760 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xb41e4af5

Device         Boot Start      End  Sectors Size Id Type
cloud-f41.raw1          1 10485759 10485759   5G ee GPT
[dan@talos tmp]$ fdisk -l cloud-f39.raw                                                                                                                                                       
Disk cloud-f39.raw: 5 GiB, 5368709120 bytes, 10485760 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: B47DE384-32B5-4A47-A4D1-882C70A42AA5

Device           Start      End Sectors  Size Type
cloud-f39.raw1    2048    10239    8192    4M PowerPC PReP boot
cloud-f39.raw2   10240  2058239 2048000 1000M Linux filesystem
cloud-f39.raw3 2058240  2263039  204800  100M EFI System
cloud-f39.raw4 2263040  2265087    2048    1M BIOS boot
cloud-f39.raw5 2265088 10483711 8218624  3.9G Linux filesystem

I have checked that this issue is persist in fedora 40 edition also. I have checked "Fedora-Cloud-Base-Generic.ppc64le-40-1.14.qcow2"

Right, firmware="ofw" should be specified in the image description such that the PrEP partition gets created

Hmm, that's strange. The trigger for prep is only the firmware setting, kiwi has this this

if self.firmware.ofw_mode():
            log.info('--> creating PReP partition')
            partition_mbsize = self.firmware.get_prep_partition_size()
            disk.create_prep_partition(
                partition_mbsize
            )
            disksize_used_mbytes += partition_mbsize

Can I see the buildlog somewhere ?

Checked our integration here:

That one has

 fdisk -l kiwi-test-image-disk.ppc64le-1.15.1-PhysicalBSZ_512-Build99.1.raw
Disk kiwi-test-image-disk.ppc64le-1.15.1-PhysicalBSZ_512-Build99.1.raw: 1.2 GiB, 1284505600 bytes, 2508800 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: B0434928-8532-42EF-BDB6-F1B2F4457893

Device                                                             Start     End Sectors  Size Type
kiwi-test-image-disk.ppc64le-1.15.1-PhysicalBSZ_512-Build99.1.raw1  2048   18431   16384    8M PowerPC PReP boot
kiwi-test-image-disk.ppc64le-1.15.1-PhysicalBSZ_512-Build99.1.raw2 18432 2508766 2490335  1.2G unknown

Thanks, I'll take a look

The log says:

[ DEBUG ]: 08:10:28 | EXEC: [losetup --sector-size 4096 -f --show /builddir/result/image/Fedora-Cloud-Base-Generic.ppc64le-Rawhide.raw]
[ INFO ]: 08:10:30 | --> creating PReP partition
[ DEBUG ]: 08:10:30 | EXEC: [sgdisk -n 1:2048:+8M -c 1:p.prep /dev/loop0]

So the image is a 4k blocksize image, is that intentional ?

Thus if you lookup the partition table you also need to loopsetup it as 4k disk

losetup --sector-size 4096 Fedora-Cloud-Base-Generic.ppc64le-Rawhide.raw
gdisk -l /dev/loop0

which shows the PReP partition. But I doubt you can boot a 4k disk as virtual system, same issue as with the s390 DASD image.

yeah, 4k block size sounds wrong and it explains the GPT PMBR size mismatch (1310719 != 10485759), this should be a simple fix

https://pagure.io/fedora-kiwi-descriptions/pull-request/61 makes grub to start, but then it fails to load the kernels to boot (hardcoded /var/tmp/ path?)

captured output of virt-install --name localcloud-41 --memory 4096 --nographics --noreboot --os-variant detect=on,name=fedora-unknown --cloud-init user-data="/home/dan/cloudinit-user-data.yaml" --disk=size=8,backing_store="/var/lib/libvirt/images/Fedora.ppc64le-Rawhide.qcow2"

...
  Welcome to Open Firmware

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php


Trying to load:  from: /pci@800000020000000/scsi@4 ...   Successfully loaded
.[30m.[47mWelcome to GRUB!


.[37m.[40m.[37m.[40m.[37m.[40merror: ../../grub-core/term/serial.c:217:serial port `com0' isn't found.

error: ../../grub-core/commands/terminal.c:138:terminal `serial' isn't found.

error: ../../grub-core/commands/terminal.c:138:terminal `serial' isn't found.

..[2J.[m.[1;1H  Booting `Fedora Linux (6.10.0-0.rc1.20240531git4a4be1ad3a6e.21.fc41.ppc64le)

41 (Cloud Edition Prerelease)'


error: ../../grub-core/fs/fshelp.c:257:file

`/var/tmp/work/build/image-root/boot/vmlinuz

-6.10.0-0.rc1.20240531git4a4be1ad3a6e.21.fc41.ppc64le' not found.

error: ../../grub-core/loader/powerpc/ieee1275/linux.c:333:you need to load the

kernel first.


Press any key to continue...

Seems the grub or BLS config files use the full host paths for the kernel files.

Seems the grub or BLS config files use the full host paths for the kernel files.

Looks like and we know that grub behaves really weird if the grub tools runs on system that is not the later target system. In kiwi there are some methods that tries to _fix the broken files tools like grub-mkconfig produces. There is for example:

_fix_grub_loader_entries_linux_and_initrd_paths

https://github.com/OSInside/kiwi/blob/main/kiwi/bootloader/config/grub2.py#L319

Maybe this code there doesn't do the right thing on ppc64le

Hmm, I see in the log that the _fix adaptions got applied

[ DEBUG   ]: 08:14:18 | Existing loader entry: linux /root/var/lib/mock/f41-kiwi-build-51316245-6132131/root/builddir/result/image/build/image-root/boot/vmlinuz-6.10.0-0.rc1.20240531git4a4be1ad3a6e.21.fc41.ppc64le
[ DEBUG   ]: 08:14:18 | Updated loader entry: linux /vmlinuz-6.10.0-0.rc1.20240531git4a4be1ad3a6e.21.fc41.ppc64le
[ DEBUG   ]: 08:14:18 | Existing loader entry: initrd /root/var/lib/mock/f41-kiwi-build-51316245-6132131/root/builddir/result/image/build/image-root/boot/initramfs-6.10.0-0.rc1.20240531git4a4be1ad3a6e.21.fc41.ppc64le.img
[ DEBUG   ]: 08:14:18 | Updated loader entry: initrd /initramfs-6.10.0-0.rc1.20240531git4a4be1ad3a6e.21.fc41.ppc64le.img
[ DEBUG   ]: 08:14:18 | custom arguments for bootloader installation {'boot_device': '/dev/loop0p2', 'root_device': '/dev/loop0p3', 'write_device': '/dev/loop0p3', 'firmware': <kiwi.firmware.FirmWare object at 0x7fffaab49f70>, 'target_removable': None, 'install_options': [], 'shim_options': [], 'prep_device': '/dev/loop0p1', 'system_volumes': {'home': {'volume_options': 'subvol=home,compress=zstd:1', 'volume_device': '/dev/loop0p3'}, 'var': {'volume_options': 'subvol=var,compress=zstd:1', 'volume_device': '/dev/loop0p3'}}, 'system_root_volume': 'root'}
[ INFO    ]: 08:14:18 | Installing grub2 on disk /dev/loop0
[ DEBUG   ]: 08:14:18 | EXEC: [mountpoint -q /var/tmp/kiwi_mount_manager.lelv27wb]

But I also see that the mountpoint got not umounted prior grub2-install, which means the actual change could not be written at the time grub2-install got called.

I need to check if the theory is correct

I think the "root" cause is in the BLS snippets, if those are wrong, then the generated grub2.cfg will be wrong too. I have also noticed the BLS snippet contains kernel parameters from the host, not the ones from kiwi.

I think the "root" cause is in the BLS snippets, if those are wrong, then the generated grub2.cfg will be wrong too. I have also noticed the BLS snippet contains kernel parameters from the host, not the ones from kiwi.

yes and all that "crap" we correct with the _fix methods. I'm currently testing a fix in kiwi ...

I really think we are hitting a bug in kiwi and proposed the following patch:

I don't have access to ppc64le systemd, maybe you can locally patch your build system and check of that fixes it ?

Thanks

no change with PR 2561 applied :-(

no change with PR 2561 applied :-(

Hrm, thanks much for testing. I'm running out of ideas for this one and have no ppc64le system for debugging. Do you think you could arrange some sort of ssh access for me on your system ?

Thanks

I will figure out something tomorrow about a remote access to some of our systems.

Hey I have found working cloud image which is Fedora41 rawhide only, "Fedora-Server-KVM-Rawhide-20240517.n.0.ppc64le.qcow2", can we just compare and see what is different in error pron images? I'm new to this, but I can check.

I got:

Disk Fedora-Rawhide-KVM.raw: 7 GiB, 7516192768 bytes, 14680064 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: D1F17B85-7DD2-429A-AB2F-2138E4518A58

Device Start End Sectors Size Type
Fedora-Rawhide-KVM.raw1 2048 10239 8192 4M PowerPC PReP boot
Fedora-Rawhide-KVM.raw2 10240 2107391 2097152 1G Linux extended boot
Fedora-Rawhide-KVM.raw3 2107392 14678015 12570624 6G Linux LVM

Hey I have found working cloud image which is Fedora41 rawhide only, "Fedora-Server-KVM-Rawhide-20240517.n.0.ppc64le.qcow2", can we just compare and see what is different in error pron images? I'm new to this, but I can check.

Hi, thanks much for helping. I think we know what's wrong, problem is I could not find the reason in the kiwi code why we are hitting this. Background is this. The image is configured to use grub, grub on Fedora is BLS grub and therefore grub-mkconfig produces data in boot/loader/entries/xxx.conf. The information in these files are unfortunately wrong, wrong paths, wrong cmdline options... because that grub thing always thinks it is running on the target and takes assumptions that are simply wrong when you build an image on a host which is not the target. We have code in kiwi in so called fix... methods which tries to correct the broken information such that the result image can boot.

Here is the problem with this particular one. It seems that the _fix method did not do the job even though in the log we can see that it fixes the information. The PR I did on kiwi with one issue I found did not work according to @sharkcz and so I'm a bit clueless without further debugging.

Debugging would work best on a Fedora ppc64le host, but I don't have one. We could checkout the git there, run poetry to setup the dev env and rebuild from the git and I'm sure we will find what's going on that way.

Thanks @osinside for the insights, can I see boot/loader/entries/xxx.conf ? I want this data because may be some alignment issue is occurring.

You can, just fetch the image and mount it similar to

kpartx -a image.raw
mount /dev/mapper/loop0p2 /mnt
cat /mnt/boot/loader/entries/*.conf

hi @osinside , I have sent you an email with some options, please check your spam folder as google/gmail doesn't like me sometimes :-)

But @osinside for corrupted raw disk img, kpartx is not creating any partition map.

But @osinside for corrupted iso, kpartx is not creating any partition map.

iso ? I thought we are talking about a raw disk image ?

@sharkcz thanks for providing me access to the fedora test infrastructure. I did setup a build environment on the ppc64le machine and rebuild the Fedora ppc64le cloud image. The thing is, I can't reproduce the error reported here.

Can you double check on the test server:

  • ppc64le-test.fedorainfracloud.org
(unit_py3_11) [osinside@ppc64le-test kiwi][PROD]$ ll ~
total 8
drwxr-xr-x.  9 osinside osinside 4096 Jun  5 16:52 fedora-kiwi-descriptions
drwxr-xr-x. 13 osinside osinside 4096 Jun  5 16:06 kiwi

You find the build results in /tmp/mytest/

[root@ppc64le-test mytest][PROD]# fdisk -l /tmp/mytest/Fedora.ppc64le-Rawhide.raw 
Disk /tmp/mytest/Fedora.ppc64le-Rawhide.raw: 5 GiB, 5368709120 bytes, 10485760 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 4E1C342C-1A5C-4A71-B1AA-CAC7D982477B

Device                                    Start      End Sectors  Size Type
/tmp/mytest/Fedora.ppc64le-Rawhide.raw1    2048    18431   16384    8M PowerPC PReP boot
/tmp/mytest/Fedora.ppc64le-Rawhide.raw2   18432  2066431 2048000 1000M Linux extended boot
/tmp/mytest/Fedora.ppc64le-Rawhide.raw3 2066432 10485726 8419295    4G Linux root (PPC64LE)

I mounted the image and looked at the entries file which says:

[root@ppc64le-test mytest][PROD]# cat /mnt/loader/entries/40e9937cdd1f4dae8a824d7307728866-6.10.0-0.rc2.24.fc41.ppc64le.conf 
title Fedora Linux (6.10.0-0.rc2.24.fc41.ppc64le) 41 (Cloud Edition Prerelease)
version 6.10.0-0.rc2.24.fc41.ppc64le
linux /vmlinuz-6.10.0-0.rc2.24.fc41.ppc64le
initrd /initramfs-6.10.0-0.rc2.24.fc41.ppc64le.img
options no_timer_check console=tty1 console=ttyS0,115200n8 systemd.firstboot=off root=UUID=906dbbdf-d292-4630-a3c5-b0f12f165b22 rootflags=subvol=root
grub_users $grub_users
grub_arg --unrestricted
grub_class fedora

which looks correct to me, given the image has an extra boot partition

@osinside I have noticed that raw disk image that are prone to error are compatible to version 1.1 and all fine are compatible to 0.10 qemu version
Hey @osinside the image you have tested is the image that persist error, right?

So the paths are still wrong for me, when running kiwi 1.0.21 (downloaded from koji) on a F-40 system (and SELinux set manually to permissive mode) :-( I have used sudo ./kiwi-build --output-dir=/var/tmp/work --image-type=oem --image-profile=Cloud-Base-Generic in the descriptions directory.

@osinside I have noticed that raw disk image that are prone to error are compatible to version 1.1 and all fine are compatible to 0.10 qemu version
Hey @osinside the image you have tested is the image that persist error, right?

I'm not sure I can follow you. What I did was this:

  • @sharkcz provided me access to one of the Fedora ppc64 infrastructure machines
  • I ssh login there, setup a kiwi dev env and rebuild the fedora ppc64 cloud image
  • The result file I inspected offline (kpartx, mount ...) and looked good to me
  • next I booted that as follows
[root@ppc64le-test mytest][PROD]# qemu-system-ppc64 -hda Fedora.ppc64le-Rawhide.qcow2
qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-cfpc=workaround
qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-sbbc=workaround
qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-ibs=workaround
qemu-system-ppc64: warning: TCG doesn't support requested feature, cap-ccf-assist=on
gtk initialization failed
  Booting `Fedora Linux (6.10.0-0.rc2.24.fc41.ppc64le) 41 (Cloud Edition
Prerelease)'

OF stdout device is: /vdevice/vty@71000000
Preparing to boot Linux version 6.10.0-0.rc2.24.fc41.ppc64le (mockbuild@dccfaea498bd46159692b12a4b579a84) (gcc (GCC) 14.1.1 20240522 (Red Hat 14.1.1-4), GNU ld version 2.42.50.20240513) #1 SMP Mon Jun  3 14:01:47 UTC 2024
Detected machine type: 0000000000000101

... and so on until login

So for me it all worked. Thus I don't know what's wrong

So the paths are still wrong for me, when running kiwi 1.0.21 (downloaded from koji) on a F-40 system (and SELinux set manually to permissive mode) :-( I have used sudo ./kiwi-build --output-dir=/var/tmp/work --image-type=oem --image-profile=Cloud-Base-Generic in the descriptions directory.

I reproduced this step. I fetched the kiwi build 10.0.21 from koji here

and installed that package, next I copied your command and build the image.

btw my test system is fc39 therefore I needed the following patch to make it succeed

[osinside@ppc64le-test fedora-kiwi-descriptions][PROD]$ git diff
diff --git a/repositories/core-rawhide.xml b/repositories/core-rawhide.xml
index c2ae124..b6977b3 100644
--- a/repositories/core-rawhide.xml
+++ b/repositories/core-rawhide.xml
@@ -1,7 +1,6 @@
 <image>
        <repository type="rpm-md" alias="rawhide" sourcetype="metalink">
                <source path="https://mirrors.fedoraproject.org/metalink?repo=rawhide&amp;arch=$basearch">
-                       <signing key="file:///usr/share/distribution-gpg-keys/fedora/RPM-GPG-KEY-fedora-rawhide-primary"/>
                </source>
        </repository>
 </image>

I needed this because the rawhide gpg key is not present on the fc39 host. But this just disables the package gpg checking and should not harm the test

At the end I got this result files

[osinside@ppc64le-test fedora-kiwi-descriptions][PROD]$ cd /var/tmp/work/
[osinside@ppc64le-test work][PROD]$ ls -l
total 1586480
drwxr-xr-x. 3 root root 46 Jun 7 09:50 build
-rw-r--r--. 1 root root 2406399 Jun 7 09:57 Fedora.ppc64le-Rawhide.changes
-rw-r--r--. 1 root root 42953 Jun 7 09:57 Fedora.ppc64le-Rawhide.packages
-rw-r--r--. 1 root root 373096448 Jun 7 10:00 Fedora.ppc64le-Rawhide.qcow2
-rw-r--r--. 1 root root 5368709120 Jun 7 09:57 Fedora.ppc64le-Rawhide.raw
-rw-r--r--. 1 root root 149 Jun 7 09:58 Fedora.ppc64le-Rawhide.verified
-rw-r--r--. 1 root root 49502 Jun 7 10:00 kiwi.result
-rw-r--r--. 1 root root 904 Jun 7 10:00 kiwi.result.json

This is also what you got right ?

Then

[osinside@ppc64le-test work][PROD]$ sudo kpartx -a Fedora.ppc64le-Rawhide.raw
[osinside@ppc64le-test work][PROD]$ sudo mount /dev/mapper/loop0p2 /mnt

[osinside@ppc64le-test work][PROD]$ sudo -i
cd /mnt/boot/loader/entries
cat 89beb977a02740b286d53f8d88072c75-6.10.0-0.rc2.20240605git32f88d65f01b.26.fc41.ppc64le.conf

And I can not see any wrong path in there and also this image just boots

Sorry guys I'm running out of ideas :)

I believe I have a plausible idea :-) Things work fine for a pseries class machine (like KVM or LPAR in PowerVM), but fail for powernv (aka bare metal). I was able to create a working image on a F-39 VM. But my main system is bare metal and it doesn't work well with BLS (too old petitboot bootloader in the firmware) and while the BLS snippet is correct, the grub.cfg in the image is not. It contains full paths to the workdir for the linux and initrd options (added by grub's 10_linux script) and also contains boot entries from the host itself (added by 30_os-prober).

@osinside, I can see now images are able to mount, did you fix something?

I believe I have a plausible idea :-) Things work fine for a pseries class machine (like KVM or LPAR in PowerVM), but fail for powernv (aka bare metal). I was able to create a working image on a F-39 VM. But my main system is bare metal and it doesn't work well with BLS (too old petitboot bootloader in the firmware) and while the BLS snippet is correct, the grub.cfg in the image is not. It contains full paths to the workdir for the linux and initrd options (added by grub's 10_linux script) and also contains boot entries from the host itself (added by 30_os-prober).

Hi, yes it makes sense but our kiwi _fix methods also covers the main grub.cfg file not only the BLS snippets. So when I mount the created Fedora.ppc64le-Rawhide.raw image on the infrastructure machine I can see the following

[root@ppc64le-test work][PROD]# pwd
/var/tmp/work

kpartx -a Fedora.ppc64le-Rawhide.raw
mount /dev/mapper/loop0p2 /mnt

[root@ppc64le-test work][PROD]# find /mnt/ | grep grub.cfg
/mnt/grub2/grub.cfg

vi /mnt/grub2/grub.cfg

### BEGIN /etc/grub.d/30_os-prober ###
        menuentry 'Fedora Linux 39 (Server Edition) (on /dev/mapper/fedora_rh--power--vm14-root)' --class gnu-linux --class gnu --class os $menuentry_id_option 'osprober-gnulinux-/vmlinuz-6.8.10-200.fc39.ppc64le--4bdb04fc-5246-4711-9fa4-2cf87b0c1d60' {
                insmod part_msdos
                insmod xfs
                search --no-floppy --fs-uuid --set=root 0a7d9720-8834-4d2d-860d-920ea3bdf145
                linux /vmlinuz-6.8.10-200.fc39.ppc64le root=/dev/dm-0
                initrd /initramfs-6.8.10-200.fc39.ppc64le.img
        }
... and so on

So also this file looks correct to me.

@sharkcz is this the same file you looked at from your image build ?

Log in to comment on this ticket.

Metadata