#3011 Proposal: enable gating for Rawhide critical path updates
Closed: Accepted a year ago by adamwill. Opened a year ago by adamwill.

Hi folks! After discussing it with @kevin and @mattdm , I want to propose that we turn on gating for Rawhide critical path updates.

We already gate critpath updates for all other release streams (stable releases and Branched when it exists). The change is simple technically: we just add fedora-rawhide to the relevant rules in https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/greenwave/templates/fedora.yaml .

The effect of the change would be that all Rawhide updates containing critical path updates would have to wait for the relevant openQA tests to run and pass before they were 'pushed stable' (which for Rawhide really means they get the appropriate tag and will be added to the buildroot on the next rebuild of it, and they will appear in the next nightly compose). Right now, most Rawhide updates are immediately tagged stable; only ones created from side tags or ones for which there is a package-specific gating policy in place do not.

The gating tests typically take ~2 hours to complete at the outside. For some packages it will be faster, as these days we only run and gate on 'relevant' tests, determined by the different critical path groups. If a lot of updates show up together, the tests could take somewhat longer to complete for all of them.

I do not expect there will be many failures. We have been running the tests for over a year now (we started in April 2022), and these days failures are not common. Most updates will not be gated. I personally investigate and work to resolve all failures of update tests; this is seen as a service provided by the QA team, maintainers are not on their own to resolve failures. I typically check for failures multiple times on weekdays and at least once on weekend days.

Bodhi these days should provide pretty accurate information on testing progress, gating status and failures, I have been working on this over the last year or so. Failed tests can be re-run or, if there is no better option, waived from Bodhi (though waiving a failure without addressing it risks the same failure then affecting every future test).

@kevin and I have been "shadow gating" Rawhide for the last several months: when an openQA test failure is investigated and found to indicate a genuine bug in the package, and we cannot immediately fix it, we untag the package before it reaches a compose. This has led to an appreciable improvement in Rawhide's reliability and, AFAIR, no negative comment or consequences.

There are always areas to work on that could make the experience smoother, but I think we're at a point where it makes sense to try this. It's easy to turn off again if it goes badly.


+1 from me. Thank you for working on this!

I support this, but could you please make this a self-contained Change proposal or at least a proposal on the devel list? I don't like when FESCo is approached to vote on things without engaging with others first. Thanks.

I don't think the Change process is really necessary or useful to this. I don't think we did a Change when we enabled update test gating for stable releases, or for Branched.

In case it's not obvious from @decathorpe 's link, I did also bump that thread today: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/WOJSVP3PY5OXPMR65KXFALGCSXOY6ULP/

+1 from me. I do not think a change makes sense here, but I do think a devel-announce post that we are doing it would be a good idea.

I won't vote since it's confusing (my vote won't count), but I'm happy to see this!

Ah, sorry about that, it's late.

+1

I’m excited about this.

+1

Metadata Update from @churchyard:
- Issue set to the milestone: Fedora Linux 39

a year ago

so by my count, this is approved, right? It's been more than seven days and there are 6 +1s and no -1s.

Metadata Update from @churchyard:
- Issue tagged with: pending announcement

a year ago

OK, so here's my plan to announce a plan: unless anyone objects, later today I'll post a mail to devel-announce@ saying that Rawhide gating will be enabled on Wednesday, and giving some general tips on how to handle it (how long tests should take, what to do about failures, who to contact, use side tags if you need to do a big chain of builds, etc). And then on Wednesday we'll turn it on. Let me know if that's not OK with anyone :)

OK, this has been running in production for over a week now and nothing has blown up. And we've caught some real bugs (currently, Firefox crashing on an illegal instruction on older CPUs).

Metadata Update from @adamwill:
- Issue close_status updated to: Accepted
- Issue status updated to: Closed (was: Open)

a year ago

Log in to comment on this ticket.

Metadata