#2454 Proposal: Require compiler / annobin updates to use rawhide gating
Closed: Accepted 2 years ago by churchyard. Opened 2 years ago by decathorpe.

We again find ourselves in the position that builds for rawhide are broken due to very late compiler and toolchain changes - right before the scheduled mass rebuild for f33.

While I recognise that deadlines like the scheduled mass rebuild and procrastination will lead to this kind of late change submissions, I fail to see why we can't do better, particularly since we now have better tools available to us.

Proposal: All future compiler / annobin updates in rawhide MUST make use of a koji side tag and rawhide gating, with at least basic gating tests to ensure that the toolchain can still compile code / packages successfully after the update. This policy MUST apply to both GCC and LLVM/clang updates in rawhide.

(Edited by @ngompa to indicate that this must apply to both GCC and LLVM/Clang)


IMO any such bump (SONAME bump, packages with tight dependencies) must happen in a side tag....

Probably we should make enforcement on any of the packages that are in the base buildroot?

I did write a test for CentOS Stream earlier in the year to help them from having this problem.

I am happy to help with making a similar test for gcc, llvm, annobin packages in Fedora, I just don't know how to do it.

Annobin does have a clang plugin now, and it was enabled in Rawhide already, so I say we have to apply this to LLVM now too.

I've edited @decathorpe's proposal post to indicate that both GCC and Clang will follow under this policy change.

@ngompa

I did write a test for CentOS Stream earlier in the year to help them from having this problem.
I am happy to help with making a similar test for gcc, llvm, annobin packages in Fedora, I just don't know how to do it.

Given that there is agreement with the maintainer, to add a gating test using those scripts one needs to follow https://docs.fedoraproject.org/en-US/ci/how-to-add-dist-git-test/#_solution and add tests.yml file in the gcc repo.

I've created PR to gcc repo with the example: https://src.fedoraproject.org/rpms/gcc/pull-request/11#request_diff with the run_tests.sh script, which does a hello world build.

To clarify, the dist-git test defined above is going to be used to gate the sidetag too. But when run for the sidetag, it will use the entire sidetag as an update of the test environment.

Thus, the workflow will look as follows:

  1. you build new gcc(without sidetag)
  2. it fails the test, because it needs annobin fixes
  3. then you build new bumped gcc into sidetag
  4. you build also update of the annobin into the same sidetag
  5. you trigger the test
  6. CI goes through all packages in sidetag and runs dist-git tests for each of them, but in the environment populated from the entire sidetag
  7. test passes
  8. gcc and annobin get tagged into buildroot together

Obviously it would be easier then to always start with the sidetag build, even when only single build is expected. It will save at least one version bump and the time required to do a full rebuild of gcc. So we achieve the initially proposed "always build in sidetag" goal.

At point 3, you should be able to tag the gcc into the side tag instead of bumping it, no?

@churchyard Actually yes, you are right, it should be possible.

I am not sure if bodhi handles overlapping updates properly (as you will have the same build in two different bodhi updates), but if it does, then rebuild won't be needed.

I am not sure ether, it just seems like a waste of resources to rebuild gcc when here is no need to rebuild it (however, please don't make this stop the effort here, I am sure this would actually encourage the use of side tags from the start the next time).

While adding tests is reasonable, it won't help with the frequent aarch64 breakage because to my knowledge, Fedora does not run any gating tests on aarch64.

There is a of course a very easy solution that avoids the plugin ABI compatibility issue: always build annobin from the gcc package, like we already do for other gcc-related components (most of them bundled in the upstream tarball). In the past, we considered this unworkable because it would delay annobin updates too much. gcc build times have become worse since then, unfortunately.

While adding tests is reasonable, it won't help with the frequent aarch64 breakage because to my knowledge, Fedora does not run any gating tests on aarch64.

What's causing the frequent aarch64 breakage? Is it in annobin, gcc or elsewhere?

We again find ourselves in the position that builds for rawhide are broken due to very late compiler and toolchain changes - right before the scheduled mass rebuild for f33.
While I recognise that deadlines like the scheduled mass rebuild and procrastination will lead to this kind of late change submissions, I fail to see why we can't do better, particularly since we now have better tools available to us.
Proposal: All future compiler / annobin updates in rawhide MUST make use of a koji side tag and rawhide gating, with at least basic gating tests to ensure that the toolchain can still compile code / packages successfully after the update. This policy MUST apply to both GCC and LLVM/clang updates in rawhide.
(Edited by @ngompa to indicate that this must apply to both GCC and LLVM/Clang)

Why must the builds be done in a side-tag? Isn't having CI gating enough to prevent this problem?

No. Because they need to be blocked from landing in the buildroot if you don't complete the required rebuild cycle, and you can't really do the rebuild cycle properly pushing directly to rawhide if it is constantly rejected and automatically untagged. Side tags are for handling stuff like this, so making it explicit ensures you know what you need to do here.

@tstellar Because rawhide gating tests only apply to individual koji builds as they're processed by bodhi, or to updates submitted from a side tag (which is what is necessary to check multi-package updates like gcc/clang + annobin). How else would you submit a fixed combination of gcc/annobin or clang/annobin?


EDIT: Sorry Neal for basically repeating what you said, pagure didn't show your comment in real time :)

@ngompa @decathorpe Ok, thanks for explaining. We already do all major version updates (e.g. LLVM 10->11) in a side-tag. Other updates don't change the LLVM API, can we have the side-tag requirement only apply to major updates?

@tstellar Not sure. At least I hope that clang/annobin aren't as tightly coupled as gcc/annobin where even minor changes in GCC break the world.

I am personally not comfortable relaxing this for LLVM unless I can be proven on its safety. Too many cycles of this with GCC+annobin have made me very wary...

There is a of course a very easy solution that avoids the plugin ABI compatibility issue: always build annobin from the gcc package, like we already do for other gcc-related components (most of them bundled in the upstream tarball). In the past, we considered this unworkable because it would delay annobin updates too much. gcc build times have become worse since then, unfortunately.

I think this might actually make things more difficult, because annobin has an llvm dependency, so this would make gcc depend on LLVM and also mean that when we build a new major version of LLVM, we would also need to rebuild gcc.

Related question: Is anything actually using the annobin data yet?

I am personally not comfortable relaxing this for LLVM unless I can be proven on its safety. Too many cycles of this with GCC+annobin have made me very wary...

LLVM/Clang has a policy to keep its API/ABI stable between minor releases (point 4): https://llvm.org/docs/HowToReleaseLLVM.html#release-patch-rules

LLVM/Clang has a lot of other library users in Fedora that also depend on this being true, and we don't typically do rebuilds for those.

I will double check annobin to make sure there are no dependencies outside of the public API/ABI of LLVM.

I've spent some time looking at the annobin source code now, and I am in favor of the general idea of this proposal, it is not that much of a change from what we currently do for clang/llvm updates.

However, I don't think side-tags and annobin rebuilds should be required for all clang/llvm updates. It's very unlikely that an clang/llvm update that does not change the package version number would break annobin, so this would create a lot of unnecessary work for us and also extra rebuilds of annobin. Also, given that we would have a CI test that checks the annobin plugin, in the rare case that there was a breakage, this would be caught either in the Pull Request or in Rawhide gating.

We already use side-tags for version updates, so requiring side-tags for these builds would be mostly a no-op anyway.

Regardless of the final wording of the proposal, I can see the following action items for us (llvm/clang mantainers) to start working on:

  • Add CI tests checking annobin compatibility to tests/llvm and tests/clang in dist-git.
  • Enable these as gating tests for the clang, llvm, and annobin packages.
  • Add annobin to the list of our side-tag/rebuild packages for LLVM major version updates.
  • Same as above for minor version updates OR fix annobin install path to not depend on the LLVM version.

@tstellar if you're sure that coordinated rebuilds of annobin won't be necessary for minor LLVM updates (because you've verified that by looking at its API usage), then we can relax this rule for LLVM. Given the history with GCC, I don't think the same is true there.

  • Add CI tests checking annobin compatibility to tests/llvm and tests/clang in dist-git.
  • Enable these as gating tests for the clang, llvm, and annobin packages.
  • Add annobin to the list of our side-tag/rebuild packages for LLVM major version updates.
  • Same as above for minor version updates OR fix annobin install path to not depend on the LLVM version.

I think that sounds like a good plan (though I'm not an expert on rawhide gating).

There is a of course a very easy solution that avoids the plugin ABI compatibility issue: always build annobin from the gcc package, like we already do for other gcc-related components (most of them bundled in the upstream tarball). In the past, we considered this unworkable because it would delay annobin updates too much. gcc build times have become worse since then, unfortunately.

I think this might actually make things more difficult, because annobin has an llvm dependency, so this would make gcc depend on LLVM and also mean that when we build a new major version of LLVM, we would also need to rebuild gcc.

The LLVM-dependent parts of annobin would of course be bundled with LLVM, also settling the ABI compatibility issue there.

So, turns out a faulty GCC / annobin combination made it to f32-stable as well, which is now breaking all aarch64 builds ... :sparkles:

Not sure, but it looks like i did run into this issue for f32 and aarch64.
https://koji.fedoraproject.org/koji/taskinfo?taskID=48261177

checking whether the C compiler works... no
configure: error: in `/builddir/build/BUILD/marco-1.24.0':
configure: error: C compiler cannot create executables

Building for all other archs and f31 and master is working fine.

I'm ok with this plan, but it also feels like a related issue may be to build up CI for non-x86_64. At least in an opt-in capacity. I'm not sure what the reality is there, but it's probably something worth getting details on.

How did the f32 update get promoted to stable? With testing updates, I'd hope that we would catch things there and a down vote on the update would prevent the push. But that does depend on people using the testing updates and responding on bodhi.

As an alternative could we require updates like gcc, annobin, llvm....etc to carry more positive votes in bodhi and/or a longer testing time?

I'm ok with this plan, but it also feels like a related issue may be to build up CI for non-x86_64. At least in an opt-in capacity. I'm not sure what the reality is there, but it's probably something worth getting details on.

That would certainly help. But the first step is to actually set up tests for the compiler toolchain. Without that, non-x86_64 CI infra doesn't help either.

How did the f32 update get promoted to stable? With testing updates, I'd hope that we would catch things there and a down vote on the update would prevent the push. But that does depend on people using the testing updates and responding on bodhi.
As an alternative could we require updates like gcc, annobin, llvm....etc to carry more positive votes in bodhi and/or a longer testing time?

I assume that nobody uses gcc from updates-testing for building RPM packages, since those repos are disabled for mock builds. And without using the CFLAGS / LDFLAGS that are enforced for package builds, the annobin issue doesn't come up (since annobin isn't used).

There has been no response by gcc maintainers in https://src.fedoraproject.org/rpms/gcc/pull-request/11 -- what do we do here, do we merge the PR?

CC @jakub @aoliva @law @mpolacek

I'm ok with this plan, but it also feels like a related issue may be to build up CI for non-x86_64. At least in an opt-in capacity. I'm not sure what the reality is there, but it's probably something worth getting details on.

That would certainly help. But the first step is to actually set up tests for the compiler toolchain. Without that, non-x86_64 CI infra doesn't help either.

Agreed.

How did the f32 update get promoted to stable? With testing updates, I'd hope that we would catch things there and a down vote on the update would prevent the push. But that does depend on people using the testing updates and responding on bodhi.
As an alternative could we require updates like gcc, annobin, llvm....etc to carry more positive votes in bodhi and/or a longer testing time?

I assume that nobody uses gcc from updates-testing for building RPM packages, since those repos are disabled for mock builds. And without using the CFLAGS / LDFLAGS that are enforced for package builds, the annobin issue doesn't come up (since annobin isn't used).

For toolchain changes, would it be useful in Fedora to have a toolchain test day coordination of some sort? Have a group of people just grab the testing packages and try to build some large high profile packages across a variety of systems?

There has been no response by gcc maintainers in https://src.fedoraproject.org/rpms/gcc/pull-request/11 -- what do we do here, do we merge the PR?

CC @jakub @aoliva @law @mpolacek

I think this PR is at least a good starting point.

I'm not sure that this is better.
Will the build fail if the check fails?

This check doesn't help anything, it does not fail the build, it just puts a message in the already massive build log that annobin needs to be rebuilt. It fixes absolutely nothing about this problem.

To be absolutely, completely, totally clear, I want all compiler toolchain builds that use annobin to never land in Fedora without annobin being rebuilt with it first.

Any and all solutions that do not accomplish this goal do not meet the standard of what I am asking for.

Failing build is undesirable, the ===TESTING to ===TESTING END=== part of the build.log is something my scripts extract from every build, I compare them against earlier builds and track and based on that decide if the build is usable or not. If it would fail the build, then no gcc could ever be updated if annobin was incompatible with it. If I do rawhide builds with --skip-tag, whcih is desirable to verify the %check anyway (and for branches one needs to file a manual bodhi and for that can be checked first), I know if I should ask @nickc for a rebuild or not. With the pull request which would only cover one arch rather than all that this one covers,
would the gating test be redone after annobin is rebuilt, or would the build be simply refused to be tagged?

@jakub Gating checks can prevent them from being merged into the main Fedora tag, they don't block you doing build cycles in a side-tag.

That's why my proposal is worded the way it is :smile:

I think the problem isn't really clear to the GCC maintainers, but landing a toolchain update in a broken state into rawhide (or stable repos, for that matter) is just massively painful for all packagers that get blocked by it.

This is why such updates should always go through side tags. It avoids having to compile gcc twice, and it avoids broken state in the buildroot entirely, for any period of time.

  • create on-demand side tag
  • build gcc within it
  • wait for newRepo
  • build annobin in side tag
  • submit to bodhi

This way, gcc+annobin updates always get into the buildroot together, without periods of time in between where things are potentially broken.

@decathorpe How would you enforce the policy ? IMO having a policy does not mean people will be following it.

@cverna Sure, that's always the problem with policies. But in this case, adding the necessary files for gating+tests into gcc dist-git would be a good start. I assume that doing things in side tags first would follow, since it's less painful than having to compile gcc twice (once to get it kicked out by bodhi, once in the side tag).

You miss the point, annobin doesn't need to be rebuilt most of the time, and annobin should be really fixed so that it doesn't need to be rebuilt at all except for major compiler updates (once a year). I've said what needs to be done for that, but it hasn't been unfortunately implemented.

side tags are very slow and an annoyance I don't want to go through for every gcc build, it is bad enough that builders for some arches are way underconfigured that the build takes days rather than hours.

@jakub In which way are side tags slow?

And I think you're missing the point of my proposal as well. annobin hasn't been fixed, so we have to deal with it somehow. It might require a small change to your workflow, but it would help 1000s of other people.

They need to create a new repo and that sometimes takes a long time when koji is busy.

Even when annobin has not been 100% fixed, it has been fixed mostly, I guess > 90% of time it won't need to be rebuilt.

/me throws up hands in frustration

@jakub You miss the point. Once gcc is built (not in side tag) it is too late to realize annobin needs to be rebuilt. At that point, dozens (hundreds?) of other packagers are affected and blocked. You are arguing that side tags are an annoyance for you and hence you rather continue doing things the way that annoys others several times a year. I am sure that is not your intention, but please, also consider this when making decisions.

As for the "side tags are annoyance" part of your argument, is this based on any recent experience with them? I've asked in https://src.fedoraproject.org/rpms/gcc/pull-request/11 "[side tags are] Expensive in what way?"

is this about a koji wait-repo command that takes 15 minutes tops? Does that really justify breaking the buildroot for everybody else for hours?

https://bugzilla.redhat.com/show_bug.cgi?id=1860549#c6 is my proposal for annobin, i.e. make annobin option handling cope with new options being added or old options removed on the gcc side at runtime.

Update from the clang side: I have the test case for clang written, I just need this redhat-rpm-config fix merged so I can enable it: https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/102

cc @sergesanspaille

@jakub Could you please answer my questions?

Explicitly:

  1. How are side tags expensive?
  2. Do you have recent annoying experience with side tags?
  3. Does that annoying experience justify the annoying experience of others when the buildroot is broken?

@jlaw could you have a look here?

Well, Jakub owns the package, but I think a gating test is the best solution since it blocks builds from landing if there's a mismatch. Building on side tags and such works, but it means that anyone doing an update would have to know about the special procedures and that doesn't scale or work if the primarily maintainers are unavailable, say due to PTO. Magic markers in the log file are the worst of the 3 options I've seen discussed above.

So my vote would be to bite the bullet with gating tests.

Note that adding gating tests should be independent of improvements to annobin and/or gcc proper to avoid these mismatches. I'd like to see Jakub and Nick work more closely on that as well.

Building on side tags and such works, but it means that anyone doing an update would have to know about the special procedures...

Note that the gating test means that if the build is not done in the side tag and the test fails, another (now in side tag) re-build would be needed. If that is not a problem, I don't actually think it is important whether or not the default action is to build gcc in a side tag or not, as long as the gating tests stops the problematic build. That said, building multi-package update in a side tag is the standard procedure, nothing special.

I don't actually think it is important whether or not the default action is to build gcc in a side tag or not, as long as the gating tests stops the problematic build.

My understanding is that it's not possible to block a normal build on Fedora CI, only merging a side-tag.

I don't actually think it is important whether or not the default action is to build gcc in a side tag or not, as long as the gating tests stops the problematic build.

My understanding is that it's not possible to block a normal build on Fedora CI, only merging a side-tag.

It's possible this is temporarily not working due to the recent branching of f33, but builds I did before the branch were gated. e.g. https://bodhi.fedoraproject.org/updates/FEDORA-2020-3e3b2ec8a5#comment-1553213

Hmm, okay, good to know that works now.

My understanding is that it's not possible to block a normal build on Fedora CI, only merging a side-tag.

All (well at least rawhide + branched pre-freeze) bodhi updates are blocked on gating. Regardless of whethere there are single-package from regular buildroot or multi-package from side tag (where "multi" means >= 1).

Metadata Update from @dcantrell:
- Issue tagged with: meeting

2 years ago

@law @jakub @tstellar @fweimer

See email I sent regarding this ticket. Thanks.

So Jakub's been on PTO but is supposed to be discussing this with mgt shortly. I've relayed to management my position that we should be using gating tests rather than relying on an individual either scanning the logs or using a script to scan the logs then manually tagging the build into the release.

The gating test might want to scan the logs or mirror the test that is done during the build. That is fine. I just think that requiring a human to interpret this data and take manual action is horribly wrong in this scenario.

Ok, so I've performed some testing and my plan for any further gcc builds is:
1) rawhide builds (which would otherwise be tagged immediately I'll perform with
fedpkg build --skip-tag
older distro builds normally with
fedpkg build
2) the annobin checks that are in gcc.spec %check seem to work and my scripts which grab
the build log so that I can compare them against earlier builds and decide if the build is ok will tell me if current annobin seems to be usable or not
3) if that testing results look good and annobin seems to be usable, I'll for rawhide
koji tag-pkg f34-updates-candidate gcc-10.2....fc34 etc
for older branches create a bodhi request just for gcc
4) if that testing results look good and annobin seems to be unusable, I'll
fedpkg request-side-tag --base-tag f34-build # or f33-build etc.
koji tag-pkg f34-build-side-NNNNN gcc-10.2...fc34 etc.
koji wait-repo f34-build-side-NNNNN --build=gcc-10.2....fc34 etc.
and then either myself or ask @nickc to build annobin into the side-tag and do a bodhi request from that
If the proposed gating test will not stand in a way for this and will allow me to do the above steps and will just work as a safety net if I forgot to do those steps or if somebody else would be building the package, then the gating test is ok for me (though perhaps better would be to download the log files and do the
sed -n -e "/^==TESTING/,/^==TESTING END/{/Annobin test/p}" build.log
checks for all architectures, because the proposed gating test will handle just one architecture, while in the build.log files all architectures are covered.

While this sounds better, it is deviates from the normal multi-build rawhide update process.

I would suggest to do the following, slightly modified steps:

  • request side tag for rawhide first
  • build gcc in side tag with "fedpkg build --target f34-side-12345"
  • do build.log analysis
  • if necessary, build annobin in side tag ("fedpkg build --target f34-side-12345")
  • submit side tag to rawhide (either with only gcc if no annobin rebuild was necessary, or with annobin if it was necessary)

At the last step, a gating test would serve as a safety check that everything went as planned.

These steps mirror the "normal" multi-build update process for rawhide, slightly modified for your workflow. Your steps include manually tagging things in koji which is brittle and easy to forget or mess up.

But gcc is not a normal multi-build, usually it is a single package build, only occassionally it could be a multi-build. And the above is more work on my side and more time wasted on the koji side.

Anyway, I've merged the pull request with the gating test, if it won't work, can always revert it.

I agree with @decathorpe that unconditionally using the side-tag would be just simpler. But the most important part was having gating checks, and those have been merged. In addition, the manual testing procedure outlined in https://pagure.io/fesco/issue/2454#comment-674320 should catch issues too. So the overall goal of avoiding having a broken package set tagged into rawhide should now be satisfied. I'd be inclined to close this as "resolved".

Metadata Update from @churchyard:
- Issue close_status updated to: Accepted
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata