#1136 [llvm] LLVM 16 pull Into Fedora 38 (FreezeException) | rhbz#2184091
Closed 4 months ago by blockerbot. Opened 4 months ago by blockerbot.

Bug details: https://bugzilla.redhat.com/show_bug.cgi?id=2184091
Information from BlockerBugs App:
2184091

Current vote summary

Commented but haven't voted yet: coremodule, kparal

The votes have been last counted at 2023-04-07 15:58 UTC and the last processed comment was #comment-850576

To learn how to vote, see:
https://pagure.io/fedora-qa/blocker-review
A quick example: BetaBlocker +1 (where the tracker name is one of BetaBlocker/FinalBlocker/BetaFE/FinalFE/0Day/PreviousRelease and the vote is one of +1/0/-1)


Discussed during the 2023-04-03 blocker review meeting: [0]

The decision to delay the classification of this as a blocker bug was made as we do not have a clear vote on this (we are at +2 / -3 for a total of -1), so we will punt it. Note: this is effectively close to a rejection, as we are very unlikely to accept this later if we don't accept it now.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2023-04-03/f38-blocker-review.2023-04-03-16.01.txt

In case I miss next week's meeting, I want to go on record as

FinalFE -1

FinalFE +1

To kind of sum up my thoughts here:

  • The accepted change was late by one day, we are at the absolute beginning of the freeze cycle, I think we made a good point for the future llvm rebases to have them earlier (and we/FESCO/.. can, and I think we already did, communicate that these changes need to go in earlier)
  • I don't see much difference in danger to the quality of the release between a hypothetical scenario where this had landed yesterday (and we wouldn't been able to vote about this) and if this lands eg. tomorrow
  • Bunch of people worked on this to make it part of Fedora 38, other people/teams could've been planning around/counting on this
  • The risks involved seem low, close to zero if we leave out mesa rebuild (which is possible, previous mesa build will keep using llvm15-libs, and in this "more-cautious" scenario, I don't simply see any potential risks), while keeping most of the benefits of new llvm and marketing benefits
  • Previous llvm rebases have a good track record, I don't remember llvm bump ever causing serious issues/fallout (and it's been a bunch of Fedora releases I've been involved with)
  • Revert of the only risky part of it (mesa rebuild) is trivial one-line change in mesa.spec
  • The obvious "Fedora will stay relevant for developers using LLVM toolchain to build their projects...."
  • This contains bunch of fixes/improvements of HW support

I am personally for "take this in early, with mesa rebuild", but compromise without mesa rebuild works just fine too.

Frantisek, I love your new way of providing justification, please keep it up! ❤️ 😉

FinalFE +1

Heads up: I'm the author of the Bodhi update.
I agree with all points that @frantisekz raised, but adding more:

  • LLVM 16.0.0 was released on 2023-03-17, after the Fedora 38 Beta Freeze.
  • But the team has been working for a few months stabilizing both upstream and downstream.

With my FESCo hat on, I'm going to request that going forward, the LLVM stack maintainers ship betas and RCs into Rawhide and deal with the fallout early and often instead of doing this to us. The responsibility of resolving bugs triggered by the LLVM update would still fall on that team, but they need to do it in Fedora Rawhide because doing this at the very edge like this is painful for everyone.

FinalFE +1

I read through the meeting log where this was discussed and I wanted to address some of the concerns:

16:43:14 <bcotton> also, the llvm16 build was finished on 30 March and there's still no update in bodhi, which is a little concerning about the attention being paid to it

For major LLVM updates, we do all the building and testing in COPR, and then do the builds in koji once the final release has been made upstream and we've resolved all known issues.

We started building and testing pre-releases in COPR two months ago and have been continuously building and testing new upstream builds since then. Our testing involves manually running the Fedora CI tests we have against the COPR builds as well as building dependent packages to see if there are any issues.

16:57:55 <adamw> we have several months of people using f38 with the current mesa on the current llvm
16:58:01 <adamw> we have tests of basic graphics mode, all that stuff
16:58:16 <adamw> we do not have any history of people using rebuilds of mesa (and all the other stuff that builds on llvm) against a new major release of llvm

This update on its own will not change the version of LLVM used by mesa or any other LLVM library users. There are llvm15 compat packages in this update, so any llvm15 users will continue to use llvm15 after this update goes in.

(edit: I just noticed someone rebuilt mesa against llvm 16 and added it to the update, I guess the mesa maintainers prefer this? But this does not have to be part of the update if people think it's risky).

16:35:56 <adamw> changes are supposed to be done well before we're worrying about final freeze

I have to say I agree with this. This is the 3rd Fedora release we've used this release process where we wait to do koji builds until the final upstream release and try to get them in before the final freeze. So far we are 1 for 3 on hitting that deadline. I don't like have such a late deadline, but it's still a much better process than trying to package release candidates and rebuilding all of the LLVM users for each new release candidate.

We're going to do a retrospective after this release and try to come up with an improved process for Fedora 39. We'll document these process changes in the LLVM-17 change request that we submit. If anyone has any ideas for us on how to improve feel free to reach out to me or comment on our change proposal.

With my FESCo hat on, I'm going to request that going forward, the LLVM stack maintainers ship betas and RCs into Rawhide and deal with the fallout early and often instead of doing this to us. The responsibility of resolving bugs triggered by the LLVM update would still fall on that team, but they need to do it in Fedora Rawhide because doing this at the very edge like this is painful for everyone.

I don't want to highjack this thread to discuss feature plans, so maybe we can continue this conversation elsewhere, but I think for this to work, we would need to be given some leeway to push changes into rawhide even when there will be known breakages. The reason why we weren't building in rawhide right away is we were trying to get everything fixed in COPR, so that the updated wouldn't be disruptive to rawhide.

Let's have a side-along conversation about this, but I think there are ways to mitigate this problem. The KDE SIG has to do stuff like this quite regularly, so I think we can leverage that experience to come up with a solution here.

For the record, the additional votes from the meeting are a -1 from me and a -1 from lruzicka. So the vote currently really stands at +5 / -3. Additionally I'll note that at least two of the +1s are from folks directly involving in driving this Change, which is obviously something of a conflict. We don't have a really formal voting policy (by design), but I do tend to weigh votes from the person/entity requesting the FE in the first place a bit lower.

We started building and testing pre-releases in COPR two months ago and have been continuously building and testing new upstream builds since then. Our testing involves manually running the Fedora CI tests we have against the COPR builds as well as building dependent packages to see if there are any issues.

This is much appreciated, but it isn't really the same as actually using the resulting stuff and seeing if anything breaks.

Previous llvm rebases have a good track record, I don't remember llvm bump ever causing serious issues/fallout (and it's been a bunch of Fedora releases I've been involved with)

There is one case I remember: https://bugzilla.redhat.com/show_bug.cgi?id=1623626 . There's another specific concern I have, which is the possibility that with an incomplete bump to 16, some media wind up with both llvm 15 and llvm 16 libs, which would considerably increase their size. I think last time we worried about that, we determined that at least for Workstation, mesa is the only llvm dep on the image, so this isn't a problem, but I'm not sure if that's still the case or if it might affect some other deliverable.

I don't see much difference in danger to the quality of the release between a hypothetical scenario where this had landed yesterday (and we wouldn't been able to vote about this) and if this lands eg. tomorrow

This argument could equally be applied to any request to land anything the day after the freeze. And then, if we applied something on freeze day + 1, where's the harm in doing something else on freeze day + 2? And so on. There has to be a cutoff, and yes, if you zoom right in and focus on the minute each side of the cutoff, it seems "silly". But there is no other really practical way to do it.

The risks involved seem low, close to zero if we leave out mesa rebuild (which is possible, previous mesa build will keep using llvm15-libs, and in this "more-cautious" scenario, I don't simply see any potential risks), while keeping most of the benefits of new llvm and marketing benefits

What benefits? The only actual concrete benefit to Fedora that has been cited so far (AFAICS) is a claim that mesa built against llvm 16 is faster. Which would be great, although it'd be nice to quantify it in some way, and we are still in the position of not being very confident about any potential risks in mesa built against llvm 16. If we don't build mesa against llvm 16, what benefit does llvm 16 bring to Fedora 38 on release day that is worth breaking the freeze? That is the key question here that I still don't see any answer to.

The obvious "Fedora will stay relevant for developers using LLVM toolchain to build their projects...."

LLVM 16 will still be in Rawhide. Stable releases always go stale eventually. I don't really see how we're not "relevant" if F38 has LLVM 15 instead of LLVM 16. Does all code suddenly stop compiling on LLVM 15 the moment LLVM 16 is shipped?

Yes, there will always be some sad case where a new version didn't quite make it before a cutoff. That's not a reason in itself to obviate the cutoff, though.

This contains bunch of fixes/improvements of HW support

Well that sounds like a benefit, but, uh, it sounds a bit odd. What is "this"? How does a new version of a compiler "fix/improve HW support"? Support for what?

For the record, the additional votes from the meeting are a -1 from me and a -1 from lruzicka. So the vote currently really stands at +5 / -3. Additionally I'll note that at least two of the +1s are from folks directly involving in driving this Change, which is obviously something of a conflict. We don't have a really formal voting policy (by design), but I do tend to weigh votes from the person/entity requesting the FE in the first place a bit lower.

Three of the +1s are from people involved with the change, myself, @tuliom , and @kkleine.

Previous llvm rebases have a good track record, I don't remember llvm bump ever causing serious issues/fallout (and it's been a bunch of Fedora releases I've been involved with)

There is one case I remember: https://bugzilla.redhat.com/show_bug.cgi?id=1623626 .

This was caused by a bug in the find-debuginfo.sh script, and it's actually happened twice, so we added a CI test to ensure that it doesn't happen again.

I am quite unhappy about the current situation. I think we are having quite a stable situation at the moment, with many issues caught early and things might go really well with the early release. Now, we stand before a decision that could make a real twist, introduce new issues and we will start from scratch. On the other hand, I understand that people have worked hard to reach the new version and that it will create a better PR for Fedora. I wonder why this cannot happen as a part of regular upgrades?

Also, I do not like being pushed to a positive vote by statements like "... bunch of people worked on this to make it part of Fedora 38, other people/teams could've been planning around/counting on this ..." because others have also been working to make the best release of Fedora 38 and this will significantly change the game and reset an important part of the testing results.

And yes, as @adamwill already mentioned, I stood with -1 on Monday, and I still tend to be -1 until I will have seen a fail-proof plan "how the problem can be mitigated" as @ngompa suggests.

FinalFreezeException -1

I am quite unhappy about the current situation. I think we are having quite a stable situation at the moment, with many issues caught early and things might go really well with the early release. Now, we stand before a decision that could make a real twist, introduce new issues and we will start from scratch.

What issues? If you'd noticed that something is broken by llvm 16, then please go ahead and report it.

If you're talking about hypothetical issues - then we can leave out mesa rebuild and I simply don't see how it could cause any issues (nothing in our blocking deliverables depends on that).

On the other hand, I understand that people have worked hard to reach the new version and that it will create a better PR for Fedora. I wonder why this cannot happen as a part of regular upgrades?

It, of course, can. But, if we pull this in as a upgrade:
- It cannot be used as a Fedora 38 marketing item
- It'll cause havoc on stuff depending on this (see the update and mesa-va-drivers from rpmfusion) - it's far better to solve this before the release
- It's against the guidelines

I simply don't see a reason why this would be better pulled in as an upgrade compared to FE. For the sakes of the process? That seems like a pretty weak reason.

Also, I do not like being pushed to a positive vote by statements like "... bunch of people worked on this to make it part of Fedora 38, other people/teams could've been planning around/counting on this ..." because others have also been working to make the best release of Fedora 38 and

this will significantly change the game and reset an important part of the testing results.

How? If we keep mesa it, it'll invalidate Basic Video Driver tests. If we keep it out, it doesn't invalidate anything.

and I still tend to be -1 until I will have seen a fail-proof plan "how the problem can be mitigated" as @ngompa suggests.

That plan is to be resolved throughout the F39 cycle, there is no reason to base deciding on this item on that fact.

What issues? If you'd noticed that something is broken by llvm 16, then please go ahead and report it.

I will report them, when I see them. I have not seen any problems so far on my machine. There might be other people affected.

  • It's against the guidelines

Right, so I believe that it's the guidelines, too, causing this discrepancy. When is it ok to cross the guidelines and when is it not? At least, I understand now that it can't be done as an upgrade.

How? If we keep mesa it, it'll invalidate Basic Video Driver tests. If we keep it out, it doesn't invalidate anything.

Yeah, I am not that knowledgeable as far as packaging, building, rebuilding, and depending is concerned and I do not want to question yours or anyone's expertise on this matter. But, I have always been the guy wearing both "the belt and the suspenders" and my vote is mostly based on this.

How lucky we are, that the situation does not merely depend on my opinion and that the outcome will be based on the majority's decision. :D

Does this bug impact on this decision here?
It seems at least Firefox is affected in some hardware.

There might be other people affected.

Yeah, but that's different sentence than the original:

introduce new issues and we will start from scratch.

Which could be understood as it implies it'll cause new issues.

  • It's against the guidelines

When is it ok to cross the guidelines and when is it not? At least, I understand now that it can't be done as an upgrade.

It's not that it can't be done, it's more like it shouldn't be done as an upgrade (let the mesa-dri-drivers from rpmfusion be as an example again, where pulling this through the freeze would allow rpmfusion maints to rebuild the package before the release, where pulling it as an upgrade would cause breakage for some time.)

outcome will be based on the majority's decision. :D

I am not sure about that, majority seems pretty clear right now.

Note that RPM Fusion can't fix anything until after it lands in Fedora.

Does this bug impact on this decision here?
It seems at least Firefox is affected in some hardware.

This is exactly why it needs to be handled either by:
- pushing this in as FE
- never pushing it to F38 (and reverting the dist-git changes in all of the 23 packages)

These are exactly the issues that upgrading during the release/after GA will cause. To allow packages/extensions like this to rebuild/sync up against the new llvm, this either has to go in now or never.

This argument could equally be applied to any request to land anything the day after the freeze. And then, if we applied something on freeze day + 1, where's the harm in doing something else on freeze day + 2? And so on. There has to be a cutoff, and yes, if you zoom right in and focus on the minute each side of the cutoff, it seems "silly". But there is no other really practical way to do it.

You're overloading my argument here. I am not implying anything for +2/+3/+n days. This technically met the deadline they were given - on final freeze, the change was complete and pushed into the updates queue, exactly as it had to. This is something different than allowing things to go in late throughout the freeze.

What benefits? The only actual concrete benefit to Fedora that has been cited so far (AFAICS) is a claim that mesa built against llvm 16 is faster. Which would be great, although it'd be nice to quantify it in some way, and we are still in the position of not being very confident about any potential risks in mesa built against llvm 16. If we don't build mesa against llvm 16, what benefit does llvm 16 bring to Fedora 38 on release day that is worth breaking the freeze? That is the key question here that I still don't see any answer to.

I'll start with counter-question - what are the risks? Having a few composes for some deliverable with two different llvm-libs causing them to be 30 MBs bigger? I kinda feel like this isn't the end of the world... (Also, I am yet to find any compose where we include anything that pulls llvm-libs in apart from mesa).

As for the benefits, without having mesa build included, just some of them could be the following two items, discussed separately as in-line replies.

The obvious "Fedora will stay relevant for developers using LLVM toolchain to build their projects...."

LLVM 16 will still be in Rawhide. Stable releases always go stale eventually. I don't really see how we're not "relevant" if F38 has LLVM 15 instead of LLVM 16. Does all code suddenly stop compiling on LLVM 15 the moment LLVM 16 is shipped?

They don't need to be stale on GA day. Developers targeting the latest LLVM wouldn't need to use rawhide for the next 6 months. The llvm releases align with Fedora release quite nicely.

This contains bunch of fixes/improvements of HW support

Well that sounds like a benefit, but, uh, it sounds a bit odd. What is "this"? How does a new version of a compiler "fix/improve HW support"? Support for what?

There are plenty of items at https://releases.llvm.org/16.0.0/docs/ReleaseNotes.html to pick from and see under this sentence.

And, if we agree to my comment above (that this shouldn't go in as an upgrade), pulling this as a FE would allow us to take in 16.0.1 with its zen4 scheduler model, bringing various effectivity and performance improvements not only for client chips, but also (or "especially" here) the enterprise SKUs.

The llvm releases align with Fedora release quite nicely.

I think this statement is basically false. If that was true, the releases would be out earlier rather than now.

What benefits? The only actual concrete benefit to Fedora that has been cited so far (AFAICS) is a claim that mesa built against llvm 16 is faster. Which would be great, although it'd be nice to quantify it in some way, and we are still in the position of not being very confident about any potential risks in mesa built against llvm 16. If we don't build mesa against llvm 16, what benefit does llvm 16 bring to Fedora 38 on release day that is worth breaking the freeze? That is the key question here that I still don't see any answer to.

I'll start with counter-question - what are the risks? Having a few composes for some deliverable with two different llvm-libs causing them to be 30 MBs bigger? I kinda feel like this isn't the end of the world... (Also, I am yet to find any compose where we include anything that pulls llvm-libs in apart from mesa).

That is not how this process works. It's a freeze. The way a freeze works is: by default things do not go in. They need a reason to go in. The reason cannot be just "well, what's the harm?" as that effectively reverses the burden of proof. The default response to an FE request is "no". For the answer to be "yes", there needs be a demonstration of sufficient benefit from the change both to justify it and to outweigh any potential risks it comes with.

Look, to be clear: I'm willing to be convinced here. Sell me a reason! That's what I keep asking for. But people keep arguing about process. The process is clear: provide a sufficiently strong reason to pull this in and we can pull it in. I'm waiting for the reason.

For instance, if somebody had spent yesterday demonstrating that mesa is 15% faster with the new llvm and they tested it on NVIDIA, AMD and Intel hardware and common VMs, and tested basic graphics mode on BIOS and UEFI and it all worked, I would be a lot closer to being convinced.

Look, to be clear: I'm willing to be convinced here. Sell me a reason! That's what I keep asking for. But people keep arguing about process. The process is clear: provide a sufficiently strong reason to pull this in and we can pull it in. I'm waiting for the reason.

@adamwill This is not accurate. The reasons have already been summarized and questions have even been answered about them. I can summarize the ones that have already been mentioned here:

  • The obvious "Fedora will stay relevant for developers using LLVM toolchain to build their projects...."
  • This contains bunch of fixes/improvements of HW support

The second one is large and affects multiple architectures.
More details are available at: https://releases.llvm.org/16.0.0/docs/ReleaseNotes.html

If you have any questions, we're happy to answer/clarify.

FinalFE +1

I'm a weak +1 to pulling this in, mostly because I think it's going to cause a bunch of confusion if it gets reverted at this point of already having been in updates-testing.

I do not think it is suitable to pull this in after the freeze as it is difficult to release rebuilds in lock step with 3rd party repositories (see the rpmfusion mesa-dri-drivers issue other people pointed out above where both components need to be built against the same llvm major version). Including it now would still give rpmfusion time to do the necessary rebuilds before F38 is release, but after the final release is too late for that. So in my mind it's a choice of either completely reverting the llvm 16 update in F38, or pulling it in through the freeze as a FE.

Also, big +1 to ngompa's request that llvm release candidates land early in rawhide to get early testing, so that it's only a tiny step from llvm rc4 to final when it's close to release time.

@tuliom Those reasons were discussed above, but to be clear, I don't find them very compelling. The first reason is too generic: you can equally make the same argument to justify pulling any major release of anything through any freeze. Any argument like that is not one I like to accept because it sets a terrible precedent that we'll just wave anything through on the basis that people like newer versions of stuff.

The second is potentially more compelling, but please do a bit more work than pointing me to a gigantic and fairly technical upstream release notes list. What new hardware support is particularly compelling for Fedora to the point that we should make an exception to the Fedora freeze to pull in a major new version?

I go with the reasoning @kalev pointed in the last comment.
And yes, mee too, I think release candidates must land early in rawhide.

FinalFE +1

FinalFE +1
It looks like my comment go in double, so forget this last one here.

That is not how this process works. It's a freeze. The way a freeze works is: by default things do not go in. They need a reason to go in. The reason cannot be just "well, what's the harm?" as that effectively reverses the burden of proof. The default response to an FE request is "no". For the answer to be "yes", there needs be a demonstration of sufficient benefit from the change both to justify it and to outweigh any potential risks it comes with.

Uff, is this set in stone somewhere? To me, it always was about weighting in potentials of gains/risks (which seems far more fitting for Fedora, especially for two of the four Fedora Foundations: Features, and First.).

And eg. if you compare it to the similar request (LLVM 14 FE for Fedora 36, https://bugzilla.redhat.com/show_bug.cgi?id=2072077 ), it was accepted in, with basically the same reasoning, by everyone who voted: https://pagure.io/fedora-qa/blocker-review/issue/698 . Has the guidelines around FEs changed between Fedora 36 and now?

they tested it on NVIDIA, AMD and Intel hardware and common VMs, and tested basic graphics mode on BIOS and UEFI and it all worked, I would be a lot closer to being convinced.

Tried this today on:

  • AMD Van Gogh (RDNA 2)
  • Intel Gen 9 iGPU (Kaby Lake)
  • Intel Gen 12 iGPU (Alder Lake-P)
  • nVidia GeForce 3050 Ti
  • Basic Video - BIOS, UEFI
  • rPi4
  • virt-manager VMs ({BIOS, UFEI} and {virgl, llvmpipe} - all these combinations)

Mesa built with LLVM 16 worked just fine, on all of the configurations mentioned in the list above.

@frantisekz there's always going to be an element of comparing the benefit and the risk, of course. That's why I suggested testing the new nesa across various setups: doing so (assuming everything works) helps reduce the perceived risk of the change. But it's not just a pure if (risk - benefit >= 0) then fe_granted :D benefit has to be, you know, > 0.005 or something. We can't just take everything through the freeze if we think risk is 0 or very low.

For me the difference with F36 (as mentioned earlier, not sure if in this ticket or bz) is that at least for F36, the update was already in updates-testing before the freeze and had received some testing. This happened later; at the time we were asked to consider the FE the update wasn't even on its way to updates-testing.

With the additional testing I'm OK changing my vote to:

FinalFE 0

which gives us a clear majority for accepting, even weighing the votes from change owners lower, so this is:

AGREED AcceptedFinalFE

Personally I'd still like to have a clearer story for exactly what the benefit is, but hey.

The following votes have been closed:

Metadata Update from @blockerbot:
- Issue status updated to: Closed (was: Open)

4 months ago

Release F38 is no longer tracked by BlockerBugs, closing this ticket.

Login to comment on this ticket.

Metadata