#8370 Fedora CoreOS + multi-arch
Opened a year ago by jlebon. Modified 3 months ago

Hi,

For Fedora CoreOS, we're currently shipping x86_64 artifacts only, but our goal is to eventually expand to aarch64, ppc64le, and s390x. A big roadblock to that though is having access to actual multi-arch hardware to build on. The Fedora CoreOS pipeline today runs in the OpenShift instance in CentOS CI, which AFAIK only has x86_64 nodes.

If we were to move the pipeline to the Fedora Infra OpenShift instance, would it be possible to make use of the same multi-arch hardware available to Koji? Two ideas that came up: having some Koji workers play double-duty as Jenkins worker nodes (which would be connected to the FCOS Jenkins instance), or teaching Koji a task similar to runroot, but instead running from a container (a poor man's OpenShift node if you will). We could restrict this to only running containers from the Fedora registry (in fact, we already have a review request for coreos-assembler: https://bugzilla.redhat.com/show_bug.cgi?id=1733344).

How feasible would this be? Any other options? An easier (but bureaucratically harder) option is to budget new multi-arch hardware that are solely dedicated to building FCOS.

Upstream discussions in https://github.com/coreos/fedora-coreos-tracker/issues/262.


So, AFAIK, you cannot have a 'mixed' OpenShift cluster, only one arch per.

If whatever you are doing could be done in a koji task, then we could probibly do that. I'd really prefer if any koji changes were upstream before we started using them in prod.

I'd really prefer that over making some of our nodes jenkins workers. I am pretty against any 'dual use' as they could mess up each other.

If you can get the koji option working, you wouldn't even need to move the main pipeline, just call out to koji?

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

a year ago

The Fedora CoreOS devs got together a few weeks back and discussed this problem. One of the easiest paths forward may be if we can get some multi-arch hardware to execute these builds on.

Some requirements:

  • bare metal OR a VM with nested virtualization support
  • I'm thinking 8G/2CPU is a minimum, 16G/4CPU preferred
  • We could manage the VMs or have them be managed by Fedora Infra

CentOS CI might have some multi-arch hardware that can be used for this, but I'm not sure.

All of that being said, s390x might be the one that would be the hardest to get a VM for or to manage ourselves.

One problem will be nested virtualization support which has been problematic and we have normally just had to dedicate a complete system for. Most of our new aarch64 hardware is currently spoken for builds or QA work. Our older aarch64 hardware will be turned off in June 2020 as it is no longer supported by its manufacturers. So I think this needs to be made into a project for

a. Getting new hardware for this. [This will need your group helping out on purchasing hardware]
b. Can IBM Cloud Power PC systems be used? Who can get a cloud account there? [That is way over my head.]
c. s390x I think is going to have to be similar somehow.. our current setup is fragile and barely suited for the composes we do with it.

If we are getting new hardware it will need to land before march 2020 or after july 2020. During that time we will not be adding hardware in PHX2 as they are wanting to lock down what they have there for the move to the new datacenter.

CentOS CI is similarly restricted in what they can offer... but may be better suited for getting the hardware located if deals are made for dedicated equipment for this.

Some food for thoughts, we are facing the same problem for OSBS in order to build containers on all the supported architecture. I think we might be able to consolidate the 2 efforts here so that we need less hardware.

For OSBS to build containers on different architectures it requires an OpenShift cluster to run on each supported platform. The OpenShift cluster is just a normal cluster and while it is currently dedicated to OSBS I think we could share it with CoreOS, I think that you should be able to run your current pipeline on this cluster without much trouble.

We are just about to start work to bring on an aarch64 cluster for Fedora IOT, so we could give this solution a try. For the remaining architecture the future is not really clear and I think it would be wiser to wait for the data centre move to happen before considering adding other architectures.

@cverna thanks for commenting. Our ideal goal is definitely to run everything in OpenShift so maybe we could join efforts there.

The real fix of course is to make Koji generate Kubernetes jobs and not be its own container system, move away from mock towards using podman etc. - xref https://github.com/projectatomic/rpmdistro-gitoverlay/blob/master/doc/reworking-fedora-releng.md#blend-upstream-testing-and-downstream-testing

We talked about this again recently.

We (infrastructure) are going to try and scrounge up a machine or two for this work, but... I want to wait until after the mass rebuild is over and the datacenter move/staging rebuild is done.
In the mean time, dusty is going to test some things out with a amazon aarch64 bare metal box.

One thing we didn't discuss much was access needs. I assume any instance you want will need to download things, build/whatever on them and push them outside somewhere? Do we know what those would be? https connections? also, are all the inputs 'trusted' ? I also further assume some coreos folks would want/need ssh access to manage things. Would you want to have the machine configured by infra ansible? Or totally self service?

We talked about this again recently.
We (infrastructure) are going to try and scrounge up a machine or two for this work, but... I want to wait until after the mass rebuild is over and the datacenter move/staging rebuild is done.
In the mean time, dusty is going to test some things out with a amazon aarch64 bare metal box.

Correct. After mass rebuild should be fine. For the datacenter finaliziation, do you know the timeframe on that?

One thing we didn't discuss much was access needs. I assume any instance you want will need to download things, build/whatever on them and push them outside somewhere? Do we know what those would be? https connections? also, are all the inputs 'trusted' ? I also further assume some coreos folks would want/need ssh access to manage things. Would you want to have the machine configured by infra ansible? Or totally self service?

It would be similar to what we have today in CentOS CI. We do have access to the broader internet. Briefly we pull

  • the coreos-assembler container from quay.io to do our builds
  • rpms from fedora
  • some git clones of the repositories we use for tooling
  • listen to fedmsg

and we typically push

  • to AWS s3 (all artifacts)
  • to GCP object storage
  • requests to AWS to create images
  • requests to GCP to create images
  • fedmsgs

I don't know if we'll need all of that for just this machine as I assume some of it will be done in the pipeline (CentOS CI) and some of it will be done on the machine, but that is what we typically do.

I imagine initially we'd probably be fine with managing the machine ourselves. If we could PXE and re-install it periodically that would be even better.

Login to comment on this ticket.

Metadata