#13082 EPEL 10.1 ppc64le buildroot broken
Closed: Fixed with Explanation 22 days ago by carlwgeorge. Opened 24 days ago by decathorpe.

  • Describe the issue

It appears that the epel10.1 buildroot is broken for builds on ppc64le. Initializing the minimal buildroot seems to consistently cause this issue (and subsequent build failure):

DEBUG util.py:461:  error: failed to exec scriptlet interpreter /bin/sh: No such file or directory
DEBUG util.py:461:  error: %prein(crypto-policies-20250905-2.gitc7eb7b2.el10_1.noarch) scriptlet failed, exit status 127
DEBUG util.py:459:  Error in PREIN scriptlet in rpm package crypto-policies
DEBUG util.py:461:  error: crypto-policies-20250905-2.gitc7eb7b2.el10_1.noarch: install failed
DEBUG util.py:461:  Creating group 'tss' with GID 59.
DEBUG util.py:461:  Creating user 'tss' (Account used for TPM access) with UID 59 and GID 59.
DEBUG util.py:459:  Error: Transaction failed

It appears that this happens because crypto-policies install is attempted before bash (/bin/sh) is present in the chroot?

Example from a real build:
https://koji.fedoraproject.org/koji/taskinfo?taskID=138851226

There are more examples in koschei too.

  • When do you need this? (YYYY/MM/DD)

ASAPP - it seems to prevent epel10.1 builds completely.

  • When is this no longer needed or useful? (YYYY/MM/DD)

N/A

  • If we cannot complete your request, what is the impact?

No mo EPEL 10.1 builds? :)


Metadata Update from @phsmoura:
- Issue tagged with: low-gain, low-trouble, ops

23 days ago

I took a look at this today. My first thought was there was a incomplete or corrupted sync of RHEL 10.1 content for the buildroot. I took these steps:

  • removed the rhel10.1-{baseos,appstream,crb} external repos
  • deleted rhel10/10.1/repos/ppc64le
  • ran rhel10-sync script
  • re-added the rhel10.1-{baseos,appstream,crb} external repos

This doesn't seem to have had any effect, as a scratch build still fails the same way. Will continue investigating tomorrow.

Metadata Update from @carlwgeorge:
- Issue untagged with: low-gain, low-trouble, ops

23 days ago

Metadata Update from @carlwgeorge:
- Issue assigned to carlwgeorge

23 days ago

Metadata Update from @carlwgeorge:
- Issue tagged with: low-gain, low-trouble, ops

23 days ago

I got to looking a bit further at this, and tried to skip koji and do a dnf install to an alternate root on a ppc64le builder, modeled after what mock does.

dnf-3 --config /home/fedora/carlwgeorge/test/dnf.conf --releasever 10 --repo baseos,appstream,crb --installroot /home/fedora/carlwgeorge/test/root --setopt install_weak_deps=0 --setopt tsflags=nocontexts install crypto-policies

This reproduces the error.

  Running scriptlet: bash-5.2.26-6.el10.ppc64le                                                                                                        55/75 
  Running scriptlet: crypto-policies-20250905-2.gitc7eb7b2.el10_1.noarch                                                                               56/75 
error: failed to exec scriptlet interpreter /bin/sh: No such file or directory
error: %prein(crypto-policies-20250905-2.gitc7eb7b2.el10_1.noarch) scriptlet failed, exit status 127

Error in PREIN scriptlet in rpm package crypto-policies
  Installing       : glibc-common-2.39-58.el10_1.2.ppc64le                                                                                             57/75 
error: crypto-policies-20250905-2.gitc7eb7b2.el10_1.noarch: install failed

  Installing       : glibc-minimal-langpack-2.39-58.el10_1.2.ppc64le

Perhaps this is a race condition in an initial chroot, where crypto-policies's scriptlet tries to use /bin/sh before it's actually available? That doesn't explain why it only happens on ppc64le, but that's my theory so far.

Yeah, it looks like an install ordering problem, where the crypto-policies prein scriptlet is run before /bin/sh is available. Not sure why it's ppc64le specific.

Last time this happened, it was a dependency cycle caused by a completely different package. See https://bugzilla.redhat.com/show_bug.cgi?id=2244744#c26, where postfix providing group(mail) caused a dependency from the filesystem package to postfix, which introduced a loop that caused crypto-policies' scriptlet to be run before /bin/sh was installed.

Now, the task is finding the package that causes this. Panu had some guidance on how to do that at https://bugzilla.redhat.com/show_bug.cgi?id=2244744#c21:

Yes, someone/something has introduced a dependency loop which blew up the delicate early bootstrap package set.

The best method for dealing with these is to find out a cut-off date where the breakage started, diff up the package set and look the new dependencies introduced. If that's not possible...

Rpm can output details with --deploops switch, but I don't know if there's any way to do that from current anaconda (it should be the default there too really). What I do is feed the package set into dnf download and then 'rpm -U --deploops --root /srv/test *.rpm'. It wont pinpoint the problem in any convenient way but it'll reduce the package set you need to look at a bit.

Until I can figure out a permanent fix, I've reverted EPEL 10.1 to use the snapshot of CentOS 10 content we were using before RHEL 10.1 was released.

❯ koji list-external-repos --quiet --tag epel10.1-base
10  c10-snapshot-baseos       bare       https://infrastructure.fedoraproject.org/repo/centos/centos-10-snapshot/BaseOS/$arch/os/
20  c10-snapshot-appstream    bare       https://infrastructure.fedoraproject.org/repo/centos/centos-10-snapshot/AppStream/$arch/os/
30  c10-snapshot-crb          bare       https://infrastructure.fedoraproject.org/repo/centos/centos-10-snapshot/CRB/$arch/os/

An EPEL 10.1 scratch build now works, so regular builds should be unblocked.

For tracking the underlying issue I've migrated this to epel/releng#67.

Metadata Update from @carlwgeorge:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

22 days ago

Log in to comment on this ticket.

Metadata
Boards 1
Ops Status: Backlog