#1346 Need Hyperscale Koji image build hosts for x86_64 and aarch64
Closed: Fixed 10 months ago by ngompa. Opened 11 months ago by ngompa.

Today, I tried to build the CentOS Hyperscale OpenStack image in CBS, and it failed on AArch64 because the kernel is an older one on c8s using 64k pages and doesn't support subpage (4k sector size on 64k page size).

Could we get dedicated x86_64 and aarch64 koji image builder nodes for CentOS Hyperscale that runs c9s with the current hyperscale kernel?


FYI : current infra tags don't cover el9 (cbs infra currently runs on rhel8).
other thing : as it was working in the past for hyperscale images, is that something that changed inside hyperscale images that suddenly would require a completely different kernel and userspace env ?

Metadata Update from @arrfab:
- Issue tagged with: blocked, cbs, feature-request, investigation

11 months ago

Yes. We rebased our kernel and btrfs-progs to v6.7, which include a change to default to 4k sector sizes for new filesystems by default regardless of page size to ensure cross-arch compatibility. The older kernel doesn't support this mode, so it chokes.

so that means that such aarch64 builder, if running that kernel, would then impact all other aarch64 rpm builds for all other SIGs ?

It should not have meaningful impact on rpm builds itself, since Mock isolates things away in that regard. Even for doing kmod builds, the host environment kernel does not matter.

Metadata Update from @arrfab:
- Issue marked as depending on: #1350

10 months ago

It should not have meaningful impact on rpm builds itself, since Mock isolates things away in that regard. Even for doing kmod builds, the host environment kernel does not matter.

Just adding a note here that it makes a difference in rare cases, e.g., when building a kernel sha512hmac is usually called during the build process. Building for rhel8 targets this will fail depending on which kernel the build host is running.

kernel 4.18.0-425.19.2.el8_7:

+ sha512hmac vmlinuz-6.6.16-1.el8.aarch64
libkcapi - Error: Netlink error: sendmsg failed
libkcapi - Error: Netlink error: sendmsg failed
libkcapi - Error: NETLINK_CRYPTO: cannot obtain cipher information for hmac(sha512) (is required crypto_user.c patch missing? see documentation)
Allocation of hmac(sha512) cipher failed (ret=-111)

kernel 5.14.0-76.hs2.hsx.el8:
No error.

The hosts seem to be picked randomly?

Note: This does not depend on the architecture.

This seems to be related to this kernel commit 1.

Note: For rhel9 targets the used kernel version makes no difference, hence I assume some userspace component, probably libkcapi, has some influence as well.

It should not have meaningful impact on rpm builds itself, since Mock isolates things away in >
The hosts seem to be picked randomly?

The koji builders are assigned to channels, and if nothing is specified, rpm builds are going to default one : https://cbs.centos.org/koji/channelinfo?channelID=1
But some of these builders are also in other channels, like image one (https://cbs.centos.org/koji/channelinfo?channelID=7) , which is where the kiwi builds are processed, and so if a rpm build also lands on one of these builders, the underlying kernel was updated to take care of understanding btrfs.
Dedicating specific builders just to build images wouldn't be the best usage we can do a build farm so ideally these builders would still accept rpm builds and have zero impact during the build process

@ngompa : trying to revisit tickets and so wondering about this one : maybe we can deploy new kojid builders in the image channel but wondering how that will impact other image builds (kiwi or else) , like for example AltImages ones

It should be fine for everyone. Strictly speaking, the Hyperscale kernel is just a newer kernel with a superset of flags from the RHEL kernel configuration.

ack ... I'll slowly work (in parallel of other tasks) to have ansible-role-kojid support for el9 (as we need some packages in infra tags, etc) and then try somewhere and then deploy in parallel .
I'll update ticket (but it has slower priority for the moment)

Other solution: see if the recent kernel 6.6 from kmods SIG ( /cc @pjgeorg ) would support 4k pages for aarch64 ? See https://cbs.centos.org/koji/buildinfo?buildID=52647

That would work too. I believe the configs include Btrfs and it's new enough.

great, let's wait for @pjgeorg to confirm this and then I can quickly update kernel on existing builders, while also working on updating stack to el9 and be ready

Metadata Update from @arrfab:
- Issue assigned to arrfab

10 months ago

The configurations used for the kernels provided by the Kmods SIG are based on the Fedora configs, hence some configurations differ from the ones used by the Hyperscale SIG / RHEL.

I do not know which features are required in particular, but at least btrfs is enabled and for aarch64 page size is set to 4K. You can find the full configuration files here.

Please note that the kernels are currently untested (the only currently tested one is 6.6.18-2.el9.x86_64 running on my laptop right now), waiting for #1369 to be able to set up some basic testing.

From a package build and image build perspective, that should be fine.

FWIW, I deployed a new aarch64 VM dedicated for koji build tasks (not yet added to cbs.centos.org) and testing the aarch64 kernel but it doesn't boot so we'll have to investigate that as first step

@ngompa: can you verify if that works for you ?
Thanks to @pjgeorg , we have now same kernel running on the following dedicated host for image tasks (not building any rpm) :

cbs list-hosts --channel image
Hostname                   Enb Rdy Load/Cap  Arches           Last Update                         
--------------------------------------------------------------------------------------------------
aarch64-04.rdu2.centos.org Y   Y    0.0/10.0 aarch64          Wed, 28 Feb 2024 14:47:06 UTC      
x86-5.cbs.centos.org       Y   Y    7.5/30.0 i386,x86_64      Wed, 28 Feb 2024 14:47:10 UTC    

Both running same kernel and btrfs-progs :

uname -a ; rpm -q btrfs-progs
Linux aarch64-04.rdu2.centos.org 6.6.18-3.el8.aarch64 #1 SMP PREEMPT_DYNAMIC Wed Feb 28 12:13:00 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
btrfs-progs-5.16.2-1.3.hsx.el8.aarch64

uname -a ; rpm -q btrfs-progs
Linux cody-n11.rdu2.centos.org 6.6.18-3.el8.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Feb 28 12:13:00 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
btrfs-progs-5.16.2-1.3.hsx.el8.x86_64

PS : for completeness, I also asked through RFE Kmods SIG to also have btrfs-progs in same repo (for people willing to use that kernel and also use btrfs)

Let me know if that works for you , especially on the aarch64 architecture (that was the initial request for this ticket)

Metadata Update from @arrfab:
- Issue untagged with: blocked
- Issue priority set to: Waiting on Reporter (was: Needs Review)
- Issue tagged with: high-gain, medium-trouble

10 months ago

Looks like it works now.

Metadata Update from @ngompa:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

10 months ago

Log in to comment on this ticket.

Metadata