#8370 Fedora CoreOS + multi-arch
Closed: Fixed 3 months ago by zlopez. Opened 2 years ago by jlebon.

Hi,

For Fedora CoreOS, we're currently shipping x86_64 artifacts only, but our goal is to eventually expand to aarch64, ppc64le, and s390x. A big roadblock to that though is having access to actual multi-arch hardware to build on. The Fedora CoreOS pipeline today runs in the OpenShift instance in CentOS CI, which AFAIK only has x86_64 nodes.

If we were to move the pipeline to the Fedora Infra OpenShift instance, would it be possible to make use of the same multi-arch hardware available to Koji? Two ideas that came up: having some Koji workers play double-duty as Jenkins worker nodes (which would be connected to the FCOS Jenkins instance), or teaching Koji a task similar to runroot, but instead running from a container (a poor man's OpenShift node if you will). We could restrict this to only running containers from the Fedora registry (in fact, we already have a review request for coreos-assembler: https://bugzilla.redhat.com/show_bug.cgi?id=1733344).

How feasible would this be? Any other options? An easier (but bureaucratically harder) option is to budget new multi-arch hardware that are solely dedicated to building FCOS.

Upstream discussions in https://github.com/coreos/fedora-coreos-tracker/issues/262.


So, AFAIK, you cannot have a 'mixed' OpenShift cluster, only one arch per.

If whatever you are doing could be done in a koji task, then we could probibly do that. I'd really prefer if any koji changes were upstream before we started using them in prod.

I'd really prefer that over making some of our nodes jenkins workers. I am pretty against any 'dual use' as they could mess up each other.

If you can get the koji option working, you wouldn't even need to move the main pipeline, just call out to koji?

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

2 years ago

The Fedora CoreOS devs got together a few weeks back and discussed this problem. One of the easiest paths forward may be if we can get some multi-arch hardware to execute these builds on.

Some requirements:

  • bare metal OR a VM with nested virtualization support
  • I'm thinking 8G/2CPU is a minimum, 16G/4CPU preferred
  • We could manage the VMs or have them be managed by Fedora Infra

CentOS CI might have some multi-arch hardware that can be used for this, but I'm not sure.

All of that being said, s390x might be the one that would be the hardest to get a VM for or to manage ourselves.

One problem will be nested virtualization support which has been problematic and we have normally just had to dedicate a complete system for. Most of our new aarch64 hardware is currently spoken for builds or QA work. Our older aarch64 hardware will be turned off in June 2020 as it is no longer supported by its manufacturers. So I think this needs to be made into a project for

a. Getting new hardware for this. [This will need your group helping out on purchasing hardware]
b. Can IBM Cloud Power PC systems be used? Who can get a cloud account there? [That is way over my head.]
c. s390x I think is going to have to be similar somehow.. our current setup is fragile and barely suited for the composes we do with it.

If we are getting new hardware it will need to land before march 2020 or after july 2020. During that time we will not be adding hardware in PHX2 as they are wanting to lock down what they have there for the move to the new datacenter.

CentOS CI is similarly restricted in what they can offer... but may be better suited for getting the hardware located if deals are made for dedicated equipment for this.

Some food for thoughts, we are facing the same problem for OSBS in order to build containers on all the supported architecture. I think we might be able to consolidate the 2 efforts here so that we need less hardware.

For OSBS to build containers on different architectures it requires an OpenShift cluster to run on each supported platform. The OpenShift cluster is just a normal cluster and while it is currently dedicated to OSBS I think we could share it with CoreOS, I think that you should be able to run your current pipeline on this cluster without much trouble.

We are just about to start work to bring on an aarch64 cluster for Fedora IOT, so we could give this solution a try. For the remaining architecture the future is not really clear and I think it would be wiser to wait for the data centre move to happen before considering adding other architectures.

@cverna thanks for commenting. Our ideal goal is definitely to run everything in OpenShift so maybe we could join efforts there.

The real fix of course is to make Koji generate Kubernetes jobs and not be its own container system, move away from mock towards using podman etc. - xref https://github.com/projectatomic/rpmdistro-gitoverlay/blob/master/doc/reworking-fedora-releng.md#blend-upstream-testing-and-downstream-testing

We talked about this again recently.

We (infrastructure) are going to try and scrounge up a machine or two for this work, but... I want to wait until after the mass rebuild is over and the datacenter move/staging rebuild is done.
In the mean time, dusty is going to test some things out with a amazon aarch64 bare metal box.

One thing we didn't discuss much was access needs. I assume any instance you want will need to download things, build/whatever on them and push them outside somewhere? Do we know what those would be? https connections? also, are all the inputs 'trusted' ? I also further assume some coreos folks would want/need ssh access to manage things. Would you want to have the machine configured by infra ansible? Or totally self service?

We talked about this again recently.
We (infrastructure) are going to try and scrounge up a machine or two for this work, but... I want to wait until after the mass rebuild is over and the datacenter move/staging rebuild is done.
In the mean time, dusty is going to test some things out with a amazon aarch64 bare metal box.

Correct. After mass rebuild should be fine. For the datacenter finaliziation, do you know the timeframe on that?

One thing we didn't discuss much was access needs. I assume any instance you want will need to download things, build/whatever on them and push them outside somewhere? Do we know what those would be? https connections? also, are all the inputs 'trusted' ? I also further assume some coreos folks would want/need ssh access to manage things. Would you want to have the machine configured by infra ansible? Or totally self service?

It would be similar to what we have today in CentOS CI. We do have access to the broader internet. Briefly we pull

  • the coreos-assembler container from quay.io to do our builds
  • rpms from fedora
  • some git clones of the repositories we use for tooling
  • listen to fedmsg

and we typically push

  • to AWS s3 (all artifacts)
  • to GCP object storage
  • requests to AWS to create images
  • requests to GCP to create images
  • fedmsgs

I don't know if we'll need all of that for just this machine as I assume some of it will be done in the pipeline (CentOS CI) and some of it will be done on the machine, but that is what we typically do.

I imagine initially we'd probably be fine with managing the machine ourselves. If we could PXE and re-install it periodically that would be even better.

We now have an aarch64 OSBS cluster so we can revisit this?

Metadata Update from @smooge:
- Issue tagged with: OSBS, medium-gain, medium-trouble, ops

2 years ago

https://github.com/coreos/fedora-coreos-tracker/issues/828 would make this conceptually much more streamlined - in theory we could "rehydrate" the content to S3 afterwards and also do uploads to AWS/GCP/etc. as a separate phase.

The hoop of source management seems easily crossed by just mirroring https://github.com/coreos/fedora-coreos-config/

Replicating and debugging the environment though is likely to be really painful though.

It's also really likely that we continue to need a non-OSBS system for our CI for all of the same reasons every upstream uses something that isn't Koji for CI.

And related to that, to me an extremely important part of the way CoreOS works is that we roll together our build and CI in a single container and workflow. If we move production builds into OSBS we'd need to have a whole conversation about how to preserve that aspect.

In OpenShift upstream development we use a Prow instance which does builds both outside OSBS (just on x86_64) for CI and for prod builds, the results are fed back into the same tests that run in CI, all oriented around a Gitops-like flow.

So the end architecture would likely be similar to that, where we just call out to OSBS with inputs (RPMs/configs) that have already been tested in CI and the result is also fed back into CI - hence forming a CD pipeline.

We had a discussion with release engineering a few months back. We decided for now:

- Short term, the CoreOS team can use ARM AWS instances to produce
development ARM FCOS builds.
- Slightly less short term, Kevin is going to check the inventory in
RDU to see if there is ARM hardware available. Based on the outcome of
that, we may discuss whether we're comfortable building production
artifacts in AWS.
- Long-term, we'll move FCOS to the new Fedora OCP4 bare metal cluster
and work on some compat/integration layer between cosa and Koji.

Update..

  • We have implemented the multi-arch POC in Fedora CoreOS for aarch64 and it is working well.
    • We have tried a few times to get the aarch64 in the RDU DC to behave, still no luck.
    • We are currently using an AWS a1.metal machine.
  • Adding ppc64le
    • We have a IBM cloud ppc64le machine we can use for testing
    • Around November 2021 we'll reach out to Fedora Releng to get a beefy VM for our pipeline
      • This corresponds with moving our FCOS pipeline to the Fedora Infra (out of CentOS CI)
        • Allows the pipeline and the ppc64le machine to be co-located.
  • Adding s390x
    • public LPARs are hard
    • We'll try to re-use Fedora's existing s390x setup once we move the FCOS pipeline to the Fedora Infra. Circle back around the beginning of the year.

cc @ravanelli @jlebon

The aarch64 in RDU needs to have new hardware spec'd and bought for it. The hardware there is non-warrantied 4+ year old systems with no support. I have tried to get various hardware working and found it is 'dead jim'. [I am taking the boxes I tried to resurrect back to the colo for the scrap pile this week.]

I think the aarch64 in amazon is probably your best bet.

I do not know what capacity Fedora has on PPC or s390x so can not answer on those parts.

  • Adding s390x
    • public LPARs are hard
    • We'll try to re-use Fedora's existing s390x setup once we move the FCOS pipeline to the Fedora Infra. Circle back around the beginning of the year.

To confirm, when we said "re-use Fedora's existing s390x setup" here, were we saying we'd have SSH access to an OS running in an LPAR, or just a VM guest?

ssh access (from internally in iad2 only) to a s390x kvm vm (which I guess could be running coreos if we can virt-install that).

Note that all fedora images/etc are made on these via nested virt, so I think it's pretty solid virt wise.

So, whats the status here? we couldn't get any hw working in RDU-CC, but FCOS has moved or is moving their build pipeline to iad2 and our ocp4 cluster.

I guess we need that to finish happening and then we can look at setting up $otherarch instances ?

We just finished the pipeline migration. In fact, just did our first prod releases from there. :)
So I think we should be unblocked on this now.

Yep. If we could get ssh access to a ppc64le or s390x instance that would be nice. The aarch64 instance in AWS we've been using has 16 vcpus and 32GiB ram. I don't know if we'll be able to get that much in terms of resources, but we can start with whatever you give us and adjust based on experience.

Note here are some images that can be used to bootstrap (since we don't officially make FCOS disk images for these architectures yet): https://fedorapeople.org/groups/fcos-images/builds/latest/

You should be able to do a normal virt-install but you'll want an Ignition config (similar to cloud-init) and to specify it like so to virt-install.

--disk path=$PWD/config.ign,format=raw,readonly=on,serial=ignition

I can help you with the Ignition config. Maybe we can schedule some time this week and work on this together.

We worked on this today for a few hours and made some progress!

  • We have a vm: buildvm-ppc64le-fcos01.iad2.fedoraproject.org installed and running on a power9 host. There's some issues on running container builds on it, but fcos folks are going to work through those.

  • We talked about s390x. It's going to be somewhat of a challenge because the s390x network cannot reach the internet. It can reach fedoraproject.org things in iad2 and thats it. Perhaps we can work out a cache solution that will work.

  • We didn't really work on aarch64 much as fcos has a aws builder thats been working well so it's not the high pri.

We worked on this today for a few hours and made some progress!

Thank you @kevin as always for helping us out.

  • We have a vm: buildvm-ppc64le-fcos01.iad2.fedoraproject.org installed and running on a power9 host. There's some issues on running container builds on it, but fcos folks are going to work through those.

Some more details/new investigation on this front. We were unable to get nested virtualization to work even after configuring our guest XML appropriately (see this upstream issue).

I did some investigation on existing Koji builders on this host that are running image builds and I found (with relatively high confidence) that they are not using KVM for their builds, but tcg emulation.

Here is a snippet of the libguestfs-test-tool output on the machine:

[fedora@buildvm-ppc64le-10 ~]$ export LIBGUESTFS_BACKEND=direct
[fedora@buildvm-ppc64le-10 ~]$ libguestfs-test-tool
...
/usr/bin/qemu-system-ppc64 \
    -global virtio-blk-pci.scsi=off \
    -no-user-config \
    -nodefaults \
    -display none \
    -machine pseries,accel=kvm:tcg \
    -m 1280 \
    -no-reboot \
    -rtc driftfix=slew \
    -kernel /var/tmp/.guestfs-1000/appliance.d/kernel \
    -initrd /var/tmp/.guestfs-1000/appliance.d/initrd \
    -object rng-random,filename=/dev/urandom,id=rng0 \
    -device virtio-rng-pci,rng=rng0 \
    -device virtio-scsi-pci,id=scsi \
    -drive file=/tmp/libguestfsb9ph23/scratch1.img,cache=unsafe,format=raw,id=hd0,if=none \
    -device scsi-hd,drive=hd0 \
    -drive file=/var/tmp/.guestfs-1000/appliance.d/root,snapshot=on,id=appliance,cache=unsafe,if=none \
    -device scsi-hd,drive=appliance \
    -device virtio-serial-pci \
    -serial stdio \
    -chardev socket,path=/run/user/1000/libguestfsy54445/guestfsd.sock,id=channel0 \
    -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
    -append "panic=1 console=hvc0 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=UUID=dbccf6e9-b1ae-4de8-ba4b-00ab14ef8957 selinux=0 guestfs_verbose=1 TERM=screen"
ioctl(KVM_CREATE_VM) failed: 22 Invalid argument
PPC KVM module is not loaded. Try modprobe kvm_hv.
qemu-system-ppc64: failed to initialize kvm: Invalid argument
qemu-system-ppc64: falling back to tcg
...
...
===== TEST FINISHED OK =====

At this point if we want KVM I think we need to move off of RHEL8 to get a newer kernel that supports the v2 of the API. See this issue comment.

Kevin is going to try to move one of the ppc64le host over to Fedora soonish so we can verify things work there. It also will give us hardware acceleration (kvm) for nested virt in the koji builder VMs too.

ok. I have moved bvmhost-p09-04 over to fedora 35 and have re-installed the buildvm-ppc64le-fcos01 vm on it with nested virt enabled.
Builders on that host can now complete a LIBGUESTFS_BACKEND=direct libguestfs-test-tool run fine.

Please retry on that machine and see if it meets your needs.

ok. I moved buildvm-ppc64le-fcos01 to the new bvmhost-p09-05. Please retest it.

So, after this only s390x is left? I'm really not sure how we are going to make that work. ;(

Do you / can you get a list of external resources the machine needs to reach? Perhaps we could get something setup that sends all those our our iad2 datacenter...

@jlebon @dustymabe We did what we could on our side. If there is anything missing feel free to reopen this ticket.

Metadata Update from @zlopez:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 months ago

I think we are good for now. Thank you!

Login to comment on this ticket.

Metadata
Boards 1
ops Status: In Progress