#299 Concerns related to enabling Atomic Host CloudImages compose in Rawhide for multi-arches
Closed: Fixed 2 years ago Opened 3 years ago by sinnykumari.

I am working on enabling Atomic Host CloudImages in Fedora 27 onward for multi-arches which includes aarch64 and ppc64le. Required changes has been made in Fedora config to enable qcow2 and raw images [Link - https://pagure.io/pungi-fedora/blob/master/f/fedora.conf#_361] .

It seems making above changes is not sufficient because we are seeing all related builds failing for both arches. One of the failed koji build link - https://koji.fedoraproject.org/koji/taskinfo?taskID=20304853

After debugging cause of failure (started with ppc64le), I see following two issues:
1. kickstart file used for building Atomic CloudImage has arch specific content at line https://pagure.io/fedora-kickstarts/blob/master/f/fedora-atomic.ks#_35 .

Possible solution:
We can remove x86_64 mention with something generic like:
basearch=$(uname -i)
and replacing x86_64 with $basearch should fix this.

2. I see that in kickstart fedora-atomic.ks [ https://pagure.io/fedora-kickstarts/blob/master/f/fedora-atomic.ks#_27 ], we don't use autopart (used by all other variants' kickstart) to create partition. Instead of autopart, we create partitions manually by using part. This leads to anaconda text based installation failure with error message "storage configuration failed: failed to find a suitable stage1 device." on ppc64le [ https://kojipkgs.fedoraproject.org//work/tasks/4883/20304883/screenshot.ppm ]
It seems required PReP partition is not getting created for ppc64le arch.
If I add into kickstart file "part prepboot --fstype "PPC PReP Boot" --size=10" to create PReP partition on ppc64le, I get successful running AH CloudImage on ppc64le (tried locally) . But, this would again add-up arch-specific changes in kickstart file.

I would like to know your thoughts on getting them fixed in proper way.

Note: Debugging issue on aarch64 is still pending which might bring-up some additional concern.


  1. i don't think we'll be able to use uname -i because that part of the kickstart is actually not a script :( . This might benefit from some of the work colin has been talking about doing inside of anaconda to provide a default ostreesetup line for a kickstart. The code that did this could possibly also make arch dependent decisions as well. See https://pagure.io/atomic-wg/issue/226

  2. we need to experiment with reqpart and see if it does the right thing for atomic - are you able to run imagefactory locally and experiment?

thanks for bringing this up

Metadata Update from @dustymabe:
- Issue tagged with: host, multi-arch, rawhide

3 years ago

The code that did this could possibly also make arch dependent decisions as well. See https://pagure.io/atomic-wg/issue/226

thinking about this a bit more... I think maybe this won't work because for the cloud images we aren't installing from the atomic ISO that we create. We're actually just installing from a generic anaconda environment and we have to tell it where the ostree remote/ref is. We'll have to brainstorm on this

we need to experiment with reqpart and see if it does the right thing for atomic - are you able to run imagefactory locally and experiment?

Haven't checked how reqpart works, will look into it.
I didn't run Imagefactory locally instead used virt-install command. Since problem was during anaconda install, it was sufficient to reproduce the problem using virt-install something like:

$ sudo virt-install --name guest-atomic --ram=2048 --cpu host --vcpus=2 --os-type=linux --os-variant=fedora22 --initrd-inject /home/skumari/fedora-atomic.ks --extra-args="ks=file:/fedora-atomic.ks text console=tty0 utf8 console=ttyS0,115200" --disk path=/var/lib/libvirt/images/guest-atomic.qcow2,size=10,bus=virtio,format=qcow2 --force --noreboot --location=http://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20170703.n.0/compose/Cloud/ppc64le/os/

I think shortest path is to change pungi to know how to replace the ostreesetup kickstart variable.

We could also change anaconda to do substitution on the URL (or potentially ostree itself).

I think shortest path is to change pungi to know how to replace the ostreesetup kickstart variable.

this is probably a feature we should have - should we open a feature request for it?

We could also change anaconda to do substitution on the URL (or potentially ostree itself).

yeah would it be possible to support something like $basearch in the url OR the ref inside of the ostree remote definition?

There are some tricky details here...is $(uname -i) actually equivalent to the architecture name we use elsewhere? I think it is for ppc64 and aarch64, but offhand I think one corner case is i386 (RPM) vs kernel (i686).

I think we need to use the RPM architecture?

Hm, Pungi has this:

TREE_ARCH_YUM_ARCH_MAP = {
    "i386": "athlon",
    "ppc64": "ppc64p7",
    "sparc": "sparc64v",
    "arm": "armv7l",
    "armhfp": "armv7hnl",
}

which um...I dunno what's going on there.

Anyways, I think we've already decided that the ostree refs use${basearch} (which is ultimately from libdnf, which should match dnf/yum), since rpm-ostree learned to substitute that.

I think it's worth pointing a pungi developer at this thread and see what they think. But...it shouldn't be too hard to do this substitution in Anaconda either.

Anyways, I think we've already decided that the ostree refs use${basearch} (which is ultimately from libdnf, which should match dnf/yum), since rpm-ostree learned to substitute that.

so I would be able to do something like: ostree remote add fedora http://dl.fp.o/atomic/${releasever}/${basearch} ? This is not something we can do today, right?

For ostreesetup we would need something like: ostreesetup --nogpg --osname=fedora-atomic --remote=fedora-atomic --url=https://kojipkgs.fedoraproject.org/compose/atomic/$releasever/ --ref=fedora/$releasever/$basearch/atomic-host - which would be a change we would need to make to anaconda I believe

I think it's worth pointing a pungi developer at this thread and see what they think. But...it shouldn't be too hard to do this substitution in Anaconda either.

yeah I think the punfi RFE is strictly being able to inject an ostreesetup line in the kickstart files I opened an RFE here: https://pagure.io/pungi/issue/673

I wrote https://github.com/projectatomic/rpm-ostree/pull/877 for this.

Note I don't think we can support substituting ${releasever} since that has deep problems around where we find the value.

I wrote https://github.com/projectatomic/rpm-ostree/pull/877 for this.

This means we wouldn't need to add support to anaconda? ${basearch} gets passed all the way down to rpm-ostree and it just works?

Note I don't think we can support substituting ${releasever} since that has deep problems around where we find the value.

ok - i don't think that is necessary for this issue.

Not quite - my plan is that anaconda would substitute ${basearch} for values passed to ostreesetup.ref, i.e. we'd use:

ostreesetup --nogpg --osname=fedora-atomic --remote=fedora-atomic --url=https://kojipkgs.fedoraproject.org/compose/atomic/$releasever/ --ref=fedora/26/$basearch/atomic-host

Having rpm-ostree itself do the substitution would be...well, it hurts my head to think about. A huge number of things would need to change. Basically the whole codebase assumes it's not variable based on the booted architecture. I don't know offhand what even the use of a dynamic substitution there would be.

Not quite - my plan is that anaconda would substitute ${basearch} for values passed to ostreesetup.ref,

sounds good. We are on the same page now

we need to experiment with reqpart and see if it does the right thing for atomic - are you able to run imagefactory locally and experiment?

Haven't checked how reqpart works, will look into it.

Yes, we can use reqpart in the fedora-atomic.ks to auto create architecture specific partitions. Image created and ran successfully with this change on ppc64le and x86_64 locally.

I wonder what is the reason of not using autopart in FAH CloudImage creation?

we need to experiment with reqpart and see if it does the right thing for atomic - are you able to run imagefactory locally and experiment?
Haven't checked how reqpart works, will look into it.

Yes, we can use reqpart in the fedora-atomic.ks to auto create architecture specific partitions. Image created and ran successfully with this change on ppc64le and x86_64 locally.

Does adding reqpart in fedora-atomic kickstart seem legit? If yes, I'd like to create a pull request for the same.

Does adding reqpart in fedora-atomic kickstart seem legit? If yes, I'd like to create a pull request for the same.

Seems legit, although we may want to wait for the anaconda support for $basearch in ostreesetup line before we do this. @walters is there an open issue for that against anaconda?

Does adding reqpart in fedora-atomic kickstart seem legit? If yes, I'd like to create a pull request for the same.

Seems legit, although we may want to wait for the anaconda support for $basearch in ostreesetup line before we do this.

Sure.

Update:
I have succeeded in creating FAH CloudImage on aarch64 using imagefactory on local machine. Image gets generated successfully when we use reqpart in fedora-atomic kickstart file.

There are two issues while booting generated cloudImage on aarch64:
1. "can't find command linuxefi, can't find command initrdefi" . Modifying kernel boot parameter from linuxefi to linux and initrdefi to initrd fixes this problem. This problem has been fixed here https://github.com/ostreedev/ostree/pull/1021 .
2. After changes made as stated in 1st issue, booting gets stuck with following messages:
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...

It seems to be a problem with console option passed in fedora-atomic kickstart file at https://pagure.io/fedora-kickstarts/blob/master/f/fedora-atomic.ks#_20 . It seems, arm devices uses console=ttyAMA0. Either adding console=ttyAMA0 in line#20 or removing console=tty1 console=ttyS0,115200n8 fixes the problem.

With above two problem fixed, aarch64 image boots successfully.

Update:
I have succeeded in creating FAH CloudImage on aarch64 using imagefactory on local machine. Image gets generated successfully when we use reqpart in fedora-atomic kickstart file.
There are two issues while booting generated cloudImage on aarch64:
1. "can't find command linuxefi, can't find command initrdefi" . Modifying kernel boot parameter from linuxefi to linux and initrdefi to initrd fixes this problem. This problem has been fixed here https://github.com/ostreedev/ostree/pull/1021 .

sweet - are we testing this with rawhide or f26?

  1. After changes made as stated in 1st issue, booting gets stuck with following messages:
    EFI stub: Booting Linux Kernel...
    EFI stub: Using DTB from configuration table
    EFI stub: Exiting boot services and installing virtual address map...
    It seems to be a problem with console option passed in fedora-atomic kickstart file at https://pagure.io/fedora-kickstarts/blob/master/f/fedora-atomic.ks#_20 . It seems, arm devices uses console=ttyAMA0. Either adding console=ttyAMA0 in line#20 or removing console=tty1 console=ttyS0,115200n8 fixes the problem.

I wish i would have known you were debugging this :) - I hit this same problem a few months ago: https://twitter.com/dustymabe/status/867225301985755137. I guess we need to find a way to specify that to just the aarch64 images.

With above two problem fixed, aarch64 image boots successfully.

:tada: have we tried with any other architectures?

Update:
I have succeeded in creating FAH CloudImage on aarch64 using imagefactory on local machine. Image gets generated successfully when we use reqpart in fedora-atomic kickstart file.
There are two issues while booting generated cloudImage on aarch64:
1. "can't find command linuxefi, can't find command initrdefi" . Modifying kernel boot parameter from linuxefi to linux and initrdefi to initrd fixes this problem. This problem has been fixed here https://github.com/ostreedev/ostree/pull/1021 .

sweet - are we testing this with rawhide or f26?

In rawhide

After changes made as stated in 1st issue, booting gets stuck with following messages:
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...
It seems to be a problem with console option passed in fedora-atomic kickstart file at https://pagure.io/fedora-kickstarts/blob/master/f/fedora-atomic.ks#_20 . It seems, arm devices uses console=ttyAMA0. Either adding console=ttyAMA0 in line#20 or removing console=tty1 console=ttyS0,115200n8 fixes the problem.

I wish i would have known you were debugging this :) - I hit this same problem a few months ago: https://twitter.com/dustymabe/status/867225301985755137. I guess we need to find a way to specify that to just the aarch64 images.

Yeah! I started debugging around 2 weeks back on aarch64, just after getting ppc64le issues sorted out (mentioned in the main ticket)

With above two problem fixed, aarch64 image boots successfully.

🎉 have we tried with any other architectures?

In what context? If you mean building and booting FAH cloudImage (qcow2) in rawhide, then x86_64 already exist in nightly compose. Other than that, I have tried on ppc64le locally and it should work once initial issues mentioned in this ticket get resolved.

In what context? If you mean building and booting FAH cloudImage (qcow2) in rawhide, then x86_64 already exist in nightly compose. Other than that, I have tried on ppc64le locally and it should work once initial issues mentioned in this ticket get resolved.

I just mean, what all architectures work? You have overcome issues with aarch64 so we know that x86_64 and aarch64 work (with fixes)? What else works/doesn't work?

In what context? If you mean building and booting FAH cloudImage (qcow2) in rawhide, then x86_64 already exist in nightly compose. Other than that, I have tried on ppc64le locally and it should work once initial issues mentioned in this ticket get resolved.

I just mean, what all architectures work? You have overcome issues with aarch64 so we know that x86_64 and aarch64 work (with fixes)? What else works/doesn't work?

x86_64, aarch64 and ppc64le works with issues fixed from this ticket. Haven't tried any other arches.

Atomic CloudImage built successfully in Fedora Rawhide nightly compose (Fedora-Rawhide-20170922.n.0) for x86_64, ppc64le and aarch64 arches. Link - https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20170922.n.0/compose/CloudImages/

this work is mostly done now. we have aarch64 and ppc64le images building in rawhide and F27.

There are few outstanding issues. I'm going to close this bug in favor of the two issues I just opened. We'll track those issues in their own ticket:

Metadata Update from @dustymabe:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata