#2102 F31 System-Wide Change: Gating Rawhide Packages
Closed: Accepted 4 months ago by sgallagh. Opened 5 months ago by bcotton.

We want to gate packages on test results before they can land in rawhide. This will reduce the amount of broken dependency, uninstallable packages and broken composes leading to a more stable rawhide as well as less work on the infrastructure and rel-eng teams to keep composes working.

This project will be split in two phases, at first only single package updates will be supported, in a second stage, we will add support for multi-packages updates.

This proposal is about the phase 1 of this project.


I'm concerned that this change leaves an important thing intentionally out of it; until I see a reasonable plan to solve the Fedora CI problems I don't think engineering time should be invested in this. This also moves the bar another inch higher for packagers to contribute to Fedora while providing theoretical only benefits.

Don't get me wrong, I agree that rawhide gating is important and I want it very much. I just want to have a solid CI to use it with and I'm afraid that if we don't make it a condition for this, we will end up with a non-functional solution.

Consider me -1 until a sustainable plan for our CI exists. @pingou suggested that a CI initiative at the council level is about to happen. I'll gladly change my vote when I see it and it gets approved.

Technically the proposal is very well designed and I like it, I'm sorry to be this guy again.

Since i am heavily involved in this effort, I will abstain from voting on this one.

I would like to ask FESCo to wait at least a week before voting on this issue. I feel some of the questions raised in the discussion on the devel list did not get satisfying answers and I would like to try to get some, potentially, reworking our proposal before FESCo reviews it.

The discussion on fedora-devel has petered out. @churchyard raised a few important points on the mailing, and they weren't answered so far:

Does this change retiring packages at all? E.g. do we "push" removals trough bodhi, or not?

And then there are the issues with CI:
https://pagure.io/fedora-ci/general/issue/16

RFE: Test on multiple architectures

This was assigned "low priority". I think that's reasonable.

https://pagure.io/fedora-ci/general/issue/2

During running tests, it's very hard to see what's happening

a.k.a. "live logs from the tests would be very useful". People working on CI are not sure if this is even possible. Systems like Travis/semaphore/etc. do this, pretty much every system I have used does, and it makes the whole system much easier to use.

https://pagure.io/fedora-ci/general/issue/4

RFE: Provide a tool for local testing

This one is absolutely crucial. If gating and CI is supposed to be used by all packagers in Fedora, then we need a way to allow them to test and debug the tests locally. Can we get some kind of plan from the CI team for solution?

Metadata Update from @zbyszek:
- Issue tagged with: meeting

5 months ago

This was discussed during today's FESCo meeting. We're still waiting for updates on the proposal.

Metadata Update from @zbyszek:
- Issue untagged with: meeting

5 months ago

The proposal has been updated and is now at: https://fedoraproject.org/wiki/Changes/GatingRawhidePackages

I've asked @bcotton if we should send it again to devel-announce for more feedback, but I don't think this should prevent FESCo members from starting to look at it again :)

Thank you all!

This seems like a sufficiently different change that we should treat it as new (with links to previous discussion). I'll announce it later today.

@bcotton Do you want this on today's FESCo agenda or shall we postpone it a week for feedback on the mailing list (treating it as a new Change)?

@sgallagh I'd like to postpone it awaiting feedback

The questions in https://pagure.io/fesco/issue/2102#comment-561835 are still outstanding.

Some notes about the new draft:

  • The docs have moved:
    CI/Quick_Start_Guide → https://docs.fedoraproject.org/en-US/ci/quick-start-guide/
    CI → https://docs.fedoraproject.org/en-US/ci/

  • I'm confused as to how opt-in is performed. The section "Without opting in into gating" talks about gating.yaml, but the CI/Quick start guide talks about tests.yaml. So which file is used to determine if gating is opted into?

    Also, if a package has no tests, we could still gate it, e.g. by testing if it installs at all. What is the plan for this? Are any such tests built-in, or do we require all packages to have test config to check if plain installation works?

  • https://fedoraproject.org/wiki/File:Single_package_GatingRawhide_bodhi.png implies that greenwave success causes a bodhi update to be re-created. I guess the arrows should go somewhere else.

  • should there be some arrow from "report failure" back to "push bodhi update" (after manual waive) for both drawings?

  • the docs talk about "rawhide-gated" and "koji tag "to-sign"". I assume those are the same thing... Is "rawhide-gated" the actual name?

The questions in https://pagure.io/fesco/issue/2102#comment-561835 are still outstanding.

They are but they are outside of the scope of this proposal per say. I can only defer to @dperpeet or @bookwar to answer them.

Some notes about the new draft:
The docs have moved:
CI/Quick_Start_Guide → https://docs.fedoraproject.org/en-US/ci/quick-start-guide/
CI → https://docs.fedoraproject.org/en-US/ci/

Will update.

I'm confused as to how opt-in is performed. The section "Without opting in into gating" talks about gating.yaml, but the CI/Quick start guide talks about tests.yaml. So which file is used to determine if gating is opted into?

The tests.yaml is used to define the tests, it's what's used by the CI system to execute the tests.
The gating.yaml is used by greenwave to know on which tests a package should be gated on.
So you can have a tests.yaml and no gating.yml, the tests will be triggered, they will run but their outcome won't be taken into consideration for gating.

Also, if a package has no tests, we could still gate it, e.g. by testing if it installs at all. What is the plan for this? Are any such tests built-in, or do we require all packages to have test config to check if plain installation works?

The proposal is really about the infrastructure to allow gating. Which tests gate, if there are any tests that gates by default, if there are exceptions to these, are all good questions but outside of the scope of this proposal.
Once the infrastructure is in place and we have improved it to a point where we're happy about its state, it will be up to someone (QA?) to propose to FESCo to change our policies to make some tests required.

https://fedoraproject.org/wiki/File:Single_package_GatingRawhide_bodhi.png implies that greenwave success causes a bodhi update to be re-created. I guess the arrows should go somewhere else.

I see what you mean, it is meant to say it notifies the bodhi update that was created earlier, I can adjust this to be similar to greenwave saying "no".

should there be some arrow from "report failure" back to "push bodhi update" (after manual waive) for both drawings?

Can be added sure

the docs talk about "rawhide-gated" and "koji tag "to-sign"". I assume those are the same thing... Is "rawhide-gated" the actual name?

They are not the same thing, rawhide-gated is the tag where single build are tested, "to-sign" is the tag where build wait to be signed before going to rawhide.
None of them names are the actual ones ("to-sign" would in fact be: f29-signing-pending for F29, and rawhide-gated does not exist yet).

So you can have a tests.yaml and no gating.yml, the tests will be triggered, they will run but their outcome won't be taken into consideration for gating.

OK. So I think https://fedoraproject.org/wiki/Changes/GatingRawhidePackages#With_opting_in_into_gating should mention gating.yaml, because right now it's not clear that without that file nothing happens. I expect the change page to be the documentation for this that a lot of packagers will use, so it's important for it to be as clear and self-explanatory as possible.

The questions in https://pagure.io/fesco/issue/2102#comment-561835 are still outstanding.

They are but they are outside of the scope of this proposal per say. I can only defer to @dperpeet or @bookwar to answer them.

Hmm, the part about removals is within the scope of this change. If package removal is not gated, then after the removal, many other things could be broken. Do we accept that, and simply require other packagers to adapt their packages for the removal? I think that's acceptable, but it should be clarified.

As for the other stuff, I understand that it's outside of the scope of this change, but it seems that effective gating depends on the CI working sufficiently smoothly. If CI is understaffed and resource-constrained, than we should not build an additional system on top which will increase the load tenfold. Since gating only makes sense if the tests are running, we shouldn't enable it until we have the dependencies.

Since gating only makes sense if the tests are running, we shouldn't enable it until we have the dependencies.

+1 to what @zbyszek says in there.

I've opened 2 more issues that I think must be solved before the gating is considered:

CI errors are undecipherable:
https://pagure.io/fedora-ci/general/issue/43

CI errors happen far to often:
https://pagure.io/fedora-ci/general/issue/44

I stand by my vote: Consider me -1 until a sustainable plan for our CI exists.

Does this change retiring packages at all? E.g. do we "push" removals trough bodhi, or not?

Hmm, the part about removals is within the scope of this change. If package removal is not gated, then after the removal, many other things could be broken. Do we accept that, and simply require other packagers to adapt their packages for the removal? I think that's acceptable, but it should be clarified.

To be honest, I do not quite understand how this proposal impacts our current retirement workflow. fedpkg retire remove all the files in the git repo and adds a single file explaining why the package was retired (based on the packager's input). So I'm not sure how gating gets involved in this.
In other words, I do not think the proposal changes the current situation, it doesn't make it better nor worse, does it?

So you can have a tests.yaml and no gating.yml, the tests will be triggered, they will run but their outcome won't be taken into consideration for gating.

OK. So I think https://fedoraproject.org/wiki/Changes/GatingRawhidePackages#With_opting_in_into_gating should mention gating.yaml, because right now it's not clear that without that file nothing happens. I expect the change page to be the documentation for this that a lot of packagers will use, so it's important for it to be as clear and self-explanatory as possible.

Adjusted, is the new wording better ?
Thanks for your feedback :)

I believe CI can't be sustainable if there isn't a proper benefit to having it. Without gating, CI simply doesn't have the same effect.

The concerns about the CI pipeline are valid, but I don't have much motivation to work on that while there is no gating. To resolve this chicken and egg problem, I believe we have to enable some centrally curated, static tests that provide meaningful feedback.

Since gating only makes sense if the tests are running, we shouldn't enable it until we have the dependencies.

+1 to what @zbyszek says in there.
I've opened 2 more issues that I think must be solved before the gating is considered:
CI errors are undecipherable:
https://pagure.io/fedora-ci/general/issue/43
CI errors happen far to often:
https://pagure.io/fedora-ci/general/issue/44
I stand by my vote: Consider me -1 until a sustainable plan for our CI exists.

The gating at this phase would be opt-in. I don't realize how that differences from using Pull Requests with CI tests when it comes to meaningful feedback. Except that that it is currently possible to have PR tests, while gating requires additional engineering effort.

Hmm, so this suggests yet another pre-requisite for gating: to have some standard suite and to enable it for more packages, so that the CI gets more exercise. That is actually independent of the issues with CI reliability, because even if the CI was fully reliable, and the gating was always enabled, without any tests, there are no checks and no benefit.

Metadata Update from @sgallagh:
- Issue tagged with: meeting

4 months ago

Since the vote in today's FESCo meeting was (+6, 1, -2), I will proceed with processing this change as approved.

Metadata Update from @bcotton:
- Issue untagged with: meeting
- Issue tagged with: pending announcement

4 months ago

Metadata Update from @sgallagh:
- Issue untagged with: pending announcement
- Issue close_status updated to: Accepted
- Issue status updated to: Closed (was: Open)

4 months ago

Login to comment on this ticket.

Metadata