#2726 RFE: make SRPM files produced by "build" tasks reproducible and architecture-aware
Opened 3 years ago by decathorpe. Modified 8 months ago

As far as I can tell, this is what a koji "build" works like right now:

  • buildSRPMfromSCM builds a SRPM file
  • that SRPM file is used by mock on all target architectures
  • built RPM files are collected
  • SRPM file from buildSRPMfromSCM task is collected

This introduces a problem: SRPM files possibly have architecture-dependent RPM metadata. For example, they can contain BuildRequires that depend %ifarch / %ifnarch conditionals in .spec files. So, the SRPM files produced by koji always depend on the architecture of the builder that the buildSRPMfromSCM task ran on, but the SRPM files are subsequently copied to the "-source" repositories for all architectures regardless, resulting in "wrong" SRPM metadata (e.g. invalid BuildRequires) on affected architectures.

RFE: Collect architecture-dependent SRPM files produced on all individual builders (side product of all mock builds anyway), instead of using the one initial SRPM from the buildSRPMfromSCM task, which might have metadata that is not correct on all target architectures.


That requires separate source repos per architecture, right?

If that is not already the case, yes.

I would also settle for an alternative way to get "correct" SRPM metadata / repository data, so long as it does not involve locally rebuilding all Fedora SRPMS ...

Isn't arch-specific SRPM design flaw? I already don't like ArchExclusive noarch packages. SRPM which builds differently on different archs is bit of a warning. What I could be thinking of is building SRPM on all archs and compare them (in same way what we do with the noarch packages) and if they differ I would fail such build. But I wouldn't store them (just one). Anyway, it still means jump of needed build resources for maybe a few problematic packages.

What are the real-world usecases for arch-dependent SRPM? Are they reasonable or should distribution hunt them and remove/fix them?

Metadata Update from @tkopecek:
- Custom field Size adjusted to None

3 years ago

Isn't arch-specific SRPM design flaw?

It is a neccessary design flaw. E.g. if you need to BR a library for additional features support, but it only is possible on some architectures, you guard it. E.g.:

# Support for the Valgrind debugger/profiler
%ifarch %{valgrind_arches}
BuildRequires: valgrind
%endif

SRPM which builds differently on different archs is very common.

I wonder if this could be solved by using rich deps, e.g. (pseudocode):

for <arch> in %{valgrind_arches}:
    BuildRequires: (valgrind if rpm(<arch>))

Obviously, this would need to be macronized

@churchyard It is different problem (and it is working for years). SRPM will be same for all archs here. BuildRequires will be interpreted in buildArch step. We're talking about git->SRPM step. SRPM itself has no BuildRequires. So, not sure what I'm missing.

Untrue. SRPM files do have RPM headers, which includes the BuildRequires. Those end up in repository metadata via createrepo.

Right now, it is impossible to query SRPM requires (BuildRequires) from repository metadata, because those SRPM files claim to be "noarch" but actually carry architecture-dependent RPM headers.

Interesting - will look into it more deeply.

For my use cases, something like an rpmdiff output between the SRPM files from different architectures would be enough, since I only care about the package metadata and not their actual contents.

But koji is the only place in the Fedora pipeline that has the complete information, since SRPM files from the actual builders are discarded.

In fact - they don't even exist. We build only one srpm on random arch from buildroot architectures. So that it is the only SRPM created in koji and used for everything.

koji might not produce it, but mock builds (look at the mock result directory) will produce a rebuilt SRPM file. koji just ignores that file.

Sure, the srpm headers can vary across arches, but the source they contain does not.
If you go to reproduce an rpm build, then as you point out mock will remake the srpm for that arch. That reproduces the arch-specific srpm, and the build itself should proceed as before.

Saving what is essentially the same srpm 5-8x over for every rhel build sounds incredibly wasteful to me.

For my use cases, something like an rpmdiff output between the SRPM files from different architectures would be enough, since I only care about the package metadata and not their actual contents.

What exactly is your use case? I'm not getting a clear picture from this issue so far. What is the specific problem caused by this behavior (which predates Koji itself)?

The use case generally is: querying build dependencies for all architectures, not just a random one.

Why is this issue coming up out now after 15 years of the same behavior in Koji (and going back to the rpm build systems that Koji replaced)?

To clarify: It has always been an issue.

I agree, this is not a new issue, it's probably just that there's now more services relying on "correct" dependency graph information. For example, the repochecker service provides the backend for the FTI/FTBFS/broken dependencies data shown in the Packager Dashboard.

The only data source it can rely on is yum repository metadata, and that is still frequently wrong since architecture-dependent BuildRequires are garbled and randomized by which koji builder built things.

I can see that there is some room for improvement here, but I don't like the idea of keeping multiple copies of the srpm.

We might want to do something like we do for noarch rpms, where we actually compare the differently produced files and make sure there are no unexpected changes. In the srpm case, the payload contents should be identical and only the headers should vary. We could assert the former and log the differences in the latter.

Yeah, that would be a good start. For everything else, it's probably hard to work around due to limitations in how RPM works ...

Ideally, there would be specialized repository data for different arches, but only one copy of the SRPM is kept, but that's probably hard to achieve with how the responsibilities are split between koji / pungi / bodhi and the build / compose tasks?

Saving what is essentially the same srpm 5-8x over for every rhel build sounds incredibly wasteful to me.

+1. This would be an very costly solution to a very niche problem.

for <arch> in %{valgrind_arches}:
BuildRequires: (valgrind if rpm(<arch>))

I think this would very cool. It'd also allow very easy introspection of dependencies for other architectures from a single srpm file locally.

Login to comment on this ticket.

Metadata