#11888 Fedora 40 Mass Rebuild Tracker
Closed: Fixed 5 days ago by jnsamyak. Opened a month ago by jnsamyak.

The Fedora 40 schedule[1] has a mass rebuild scheduled for Jan 17, We need to plan and coordinate all tasks in preparation for it. For the driving changes please refer[2].

[1] https://fedorapeople.org/groups/schedule/f-40/f-40-key-tasks.html
[2] https://fedoraproject.org/wiki/Fedora_40_Mass_Rebuild#Driving_Features


Metadata Update from @jnsamyak:
- Issue tagged with: f40, high-gain, high-trouble, mass rebuild

a month ago

This issue tracker will be used for any discussion and information that will happen regarding the F40 Mass Rebuild.

Looks like the GCC/Toolchain update has not landed because it's not through the FESCo pipeline yet - that should probably be addressed before starting the mass rebuild?

And Boost upgrade hasn't landed yet either: https://fedoraproject.org/wiki/Changes/F40Boost183 ; https://src.fedoraproject.org/rpms/boost/pull-request/23 (I guess that'd be great for mass rebuild too?) @jwakely (for some reason, Contingency deadline is not present at the change wiki).

Hum, there was an announcement supposed to go out yesterday that we are moving the mass rebuild for 1 day to 2014-01-18, maybe got lost in moderation...

We were going to start the Boost and TBB rebuilds today, having failed to notice the mass rebuild date. Oops. I think I'll just push the boost and tbb updates to dist-git and let the mass rebuild build them.

just FYI, the mass rebuild just does builds a-z, there's no dep ordering there, so you may want to do actual boost/tbb builds before the mass rebuild so all the packages rebuilt will use the new ones.

Also AFAIK, if you just push the updates to dist-git they will be rebuilt, but nothing will be rebuilt "against" them. The builds from the mass rebuild are not available in the mass rebuild buildroot, everything goes in the repo after the mass rebuild is done.

Ah OK. Is it worth using a side tag now, given that there probably won't be time to merge anything back to rawhide before the mass rebuild, or should we just build in rawhide?

Ah OK. Is it worth using a side tag now, given that there probably won't be time to merge anything back to rawhide before the mass rebuild, or should we just build in rawhide?

It should be doable, the rebuild will start tomorrow as per sent email, there should be enough time to merge the side tag (if you need proven packager help with rebuilding dependencies, I'll be able to handle it around midnight CET).

OK, I'll create the tag and start building. I'm a proven packager so should be able to do it all, thanks for the offer though.

Nearly all of the Accepted changes now have bugzillas in the wiki
pages too if the rebuild pulls this information from those pages
and/or the change set, but there is one System Wide accepted change
without a tracker bug,
https://fedoraproject.org/wiki/Changes/SPDX_Licenses_Phase_3, I am
working through how to resolve the issue today.

On Wed, Jan 17, 2024 at 6:02=E2=80=AFPM Jonathan Wakely pagure@pagure.io =
wrote:

jwakely added a new comment to an issue you are following:
OK, I'll create the tag and start building. I'm a proven packager so shou= ld be able to do it all, thanks for the offer though.

To reply, visit the link below or just reply to this email
https://pagure.io/releng/issue/11888

--=20

Aoife Moloney

Fedora Operations Architect

Fedora Project

Matrix: @amoloney:fedora.im

IRC: amoloney

Also, it seems like there might be an issue with dnf5 for building which we should sort out before starting the mass rebuild.

gcc doesn't build anymore: https://koji.fedoraproject.org/koji/taskinfo?taskID=111907674

DEBUG util.py:461:  Failed to resolve the transaction:
DEBUG util.py:461:  No match for argument: /lib/libc.so.6
DEBUG util.py:461:  Package "glibc-2.38.9000-33.fc40.x86_64" is already installed.
DEBUG util.py:461:  No match for argument: /usr/lib/libc.so
DEBUG util.py:461:  No match for argument: /usr/lib64/libc.so

so we may need to revert to dnf or figure out whats going on. We don't want everything using file deps to fail. ;(

The odd thing is that not everything using file deps is failing in rawhide. Reverting ELN to dnf fixed the gcc build there, so dnf5 does appear to be the culprit, although I can't explain why.

But here's another possible dnf5-related build error:

https://koji.fedoraproject.org/koji/taskinfo?taskID=111911784
https://koji.fedoraproject.org/koji/taskinfo?taskID=111911929

DEBUG util.py:461: Unknown argument "--allowerasing" for command "install". Add "--help" for more information about the arguments.

We don't want everything using file deps to fail. ;(

Nah, this is the corresponding bug https://bugzilla.redhat.com/show_bug.cgi?id=2180842
At least the bug seems like all the packages in Rawhide are ready for dropping file-list.

I'm not sure how many packages are affected, but Evan claimed that we could set a DNF option (repository or main) optional_metadata_types=filelists to work-around this.

@egoode fyi.

The gcc case, depending both on 32 and 64-bit devel packages is a bit tricky.
We may IMO do BuildRequires: glibc-devel(%__isa_name-32).

Even though this might seem that it produces broken SRPMs, it is not. The SRPM file would depend on glibc-devel(x86-32) if geneated on x86_64, but Mock still automatically rebuilds the SRPM against the target architecture first, and then builds RPMs with a correctly generated deps (e.g. glibc-devel(ppc-64) if anyone still builds fedora-rawhide-ppc64).

Note the GCC PR. I'm not sure whether we need to rebuild GCC first here or not (I thought we already have the built GCC from the side-tag, but not sure). Does anyone know how to proceed?

Beware, rawhide Koji is partially (some builders only?) broken by dnf5, see https://pagure.io/releng/issue/11737#comment-892006 and further.

I strongly suggest not starting the mass rebuild before this is solved.

Beware, rawhide Koji is partially (some builders only?) broken by dnf5, see https://pagure.io/releng/issue/11737#comment-892006 and further.

I strongly suggest not starting the mass rebuild before this is solved.

Also: https://pagure.io/fedora-infrastructure/issue/11725

The dnf5 issue is resolved by @humaton here: https://pagure.io/releng/issue/11737

Not yet.
Would you stop mass rebuild for now? It seems mass rebuild began, but this issue again broke build, e.g:

https://koji.fedoraproject.org/koji/taskinfo?taskID=111924132

The dnf5 issue is resolved by @humaton here: https://pagure.io/releng/issue/11737

Not yet.
Would you stop mass rebuild for now? It seems mass rebuild began, but this issue again broke build, e.g:

https://koji.fedoraproject.org/koji/taskinfo?taskID=111924132

Boost side-tag wasn't merged yet either, should happen before the mass rebuild.

I've created an update to merge the boost and tbb side tag: https://bodhi.fedoraproject.org/updates/FEDORA-2024-7c21b7afa2

There are lots of packages that I couldn't rebuild, but they just keep failing due to the dnf issues so there's no point trying them again and again.

@jwakely the dnf issues should be over now. Per https://pagure.io/releng/issue/11737 - if you need to rebuild something, it should be doable now.

Great, thanks. I'm waiting for the gating tests to finish, so I can waive them and the side tag to be merged. As soon as the new boost (and all the other pkgs in the side tag) land in f40-boost I'll resume the rebuilds.

ok, so we need libreoffice and cryfs in the boost sidetag for them to pass gating and also for rawhide composes to not just fall over after that.

@jwakely is firing those builds off now. Hopefully in 4-5 hours we can land the sidetag and then @jnsamyak will be able to start the mass rebuild when he gets in in his morning.

The sidetag is in, so I think we are all ready now...

!info the mass rebuild for F40 has started 🤞

A major bug has been discovered in binutils on x86_64 used during the mass rebuild, see
https://bugzilla.redhat.com/2259333
https://bugzilla.redhat.com/2259285
In the better case (like when people used the ld.gold linker) it just resulted in build failures, but as the other bug shows, it can actually result in mislinking of any shared library or binary.
The bug was that whether the assembler emitted a newly added relocation meant for instructions using APX (i.e. using registers %r16..%r31 in future CPUs) or the previous relocation was decided based on uninitialized bitfield, so it randomly emitted relocations for the new APX instructions and unfortunately when the linker performs relaxation of the new relocations, it assumes it appears in APX instructions rather than say REX prefixed instructions or non-prefixed ones and overwrites the instruction with something that can crash or misbehave.

So, I'm afraid we need to throw away any packages built with the broken binutils-2.41-27.fc40 or
binutils-2.41-28.fc40, make sure those packages aren't tagged into anything and rebuild them again. Fixed binutils-2.41-29.fc40 is in rawhide starting with Saturday evening CET - https://bodhi.fedoraproject.org/updates/FEDORA-2024-8feaa25503
I think binutils-2.41-27.fc40 hasn't been tagged into rawhide and binutils-2.41-28.fc40 has been tagged into rawhide in https://bodhi.fedoraproject.org/updates/FEDORA-2024-084da6321c on Friday evening CET, so it is about a whole day of builds.
Obviously, binutils-2.41-29.fc40 itself shouldn't be thrown away even when it has been built with the incorrect binutils (but it probably should be rebuilt just to be sure).

Thanks for the heads up @jakub. The mass rebuild script is stopped.
There is https://bodhi.fedoraproject.org/updates/FEDORA-2024-413fdaf414 binutils-2.41-30.fc40 once it lands in rawhide we will most probably just rebuild everything(d`oh) once again.

This is gonna take some time I think, and there is an IPA issue going on which is being fixed this is not landed in Rawhide yet https://bodhi.fedoraproject.org/updates/FEDORA-2024-413fdaf414; Hopefully, it gets better in my morning, so I can refire the mass-rebuild script after cleaning out the correct massbuild dir created during previous mass-rebuilds.

There was a bug within glibc initially it was diagnosed as a sort of bug that caused regression in Glibc but it; 's back with GCC folks for debugging anyways this means we had to start mass-rebuild again for the third time :(

Since IPA/FAS is down, we can't stop the script that's in progress, once the auth and bug[1] mentioned below get sorted we will fire again, thanks for everyone's patience on this.

Another notice: FESCO will extend the release cycle schedule by a week.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2259845

@jakub, any idea when the bug will be resolved; Is there a fix in progress?

The first rebuild round also exposed some ICEs on aarch64, report at https://bugzilla.redhat.com/show_bug.cgi?id=2259937

@jakub, any idea when the bug will be resolved; Is there a fix in progress?

It's a missed optimization in glibc that exposed a long-standing pre-existing bug in Ruby. I will fix the glibc missed optimization today, which will hide the Ruby testsuite failure once more. I don't think this issue matters to the mass rebuild.

@jakub could you find a minute to take a look at "[Mass Rebuild] Possible aarch64 codegen issues since GCC 14" ( https://bugzilla.redhat.com/show_bug.cgi?id=2260867 )?

Thanks!

The mass-tag script was run just now after the mass rebuild completion, which resulted in Tagging 21971 builds.

What can releng do to avoid that next time?

cc'ing @jjames who I think might have some ideas.

We have had this happen before. What happens is that changes in gcc result in a hash change to the Provides in the base ocaml package and suddenly the entire OCaml world is broken. Is there some way that we could opt out of building OCaml packages during mass rebuilds? Then Richard Jones could run his magic build-the-ocaml-world script which does the builds in order.

Can sidetags be merged with sidetags? I was thinking that there was at least one other language like ocaml where ordering and structure is needed to make sure that hashes work properly after a rebuild. Having those done in a specific side-tag with a proper script, tested and then merged either into the rawhide-rebuild or just into rawhide might be useful.

@jnsamyak I am receiving bugzillas for packages that have been fixed in the meantime. Is the epoch thing working correctly?

Hello @churchyard

It should, ideally the script is taking the info from massrebuildinfo file where the epoch is set for '2024-01-22 20:45:00.000000' the time we ran the mass rebuild again!

Hello @churchyard

It should, ideally the script is taking the info from massrebuildinfo file where the epoch is set for '2024-01-22 20:45:00.000000' the time we ran the mass rebuild again!

Thanks. I was mistaken, python3.6 actually failed but for a new reason. Ignore my comment.

Is there some way that we could opt out of building OCaml packages during mass rebuilds?

Sorry I forgot to mention this earlier, generally, we have a way to opt out packages from mass-rebuild, in case of future rebuilds this can be helpful; in massrebuildsinfo we have PKG_SKIP_LIST which skips the package rebuilding; In case we need to utilize this functionality, one can open ping releng or the best way is to open an issue before the mass-rebuild that, these packages should be not participating in the rebuild, this will help all of us to know what to look out for and save ourselves from breaking things :)

As per community request, this procedure has been successfully implemented before, by an example from Fedora 38. For further reference, please see this link: Fedora 38 Example.

CC: @jjames

It's also possible to add a noautobuild file to the root of a distgit repository and the mass rebuild script will skip it (https://pagure.io/releng/blob/main/f/scripts/mass-rebuild.py#_143-147).

FYI we discovered a few packages which failed at the buildSRPMFromSCM step, which means that no koji build (NVR) was ever registered, and therefore isn't showing up as a failed build in the usual tracker. These ones are from the ELN package set, but there may be more across Fedora as a whole:

Metadata Update from @jnsamyak:
- Issue assigned to jnsamyak

7 days ago

Metadata Update from @jnsamyak:
- Issue tagged with: meeting

7 days ago

Since, the mass rebuild it over and the above-mentioned issue is already being assigned/or in progress; I'll close this now, if folks have any other queries feel free to open a new ticket for the same!

Thanks all for collaborating \0/

Metadata Update from @jnsamyak:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

5 days ago

Login to comment on this ticket.

Metadata