#1890 updating the FTBFS cleanup policy
Closed 5 years ago Opened 5 years ago by zbyszek.

This is a continuation of #1877: large number of packages FTBFS in F28.

Status quo:
Policy documents:
- https://fedoraproject.org/wiki/Fails_to_build_from_source
- https://docs.pagure.org/releng/sop_deprecate_ftbfs_packages.html
Both of those contain some outdated comments, but the gist is that

Somebody runs a regular rebuild
New bugs for each failing package will be filed in Bugzilla
Packages with unresolved FTBFS bugs at Feature Freeze for the N+2 release will be removed from the distribution.

That policy is reasonable, so I propose to most update/clarify it:

  1. explicitly say that the automatic creation creation of FTBFS bugs will happen after each mass rebuild

  2. remove all the outdated mentions of Matt Domsch and monthly rebuilds, BugZappers, CVS, PackageDB, etc.

  3. replace "Packages with unresolved FTBFS bugs at Feature Freeze for the N+2 release will be removed" with "Packages that failed to build in two consecutive mass rebuilds will be removed 6 weeks after the second mass rebuild". This actually gives a similar timeline, but Feature Freeze is not part of the process anymore. 6 weeks is after the branch point and bodhi activation, which is not ideal, but I think it's necessary to give maintainers some time to fix those bugs.

  4. Introduce a rule that changing the FTBFS bug status to ASSIGNED/anything-else-that-is-not-NEW means that the bug is being worked on.

I we start enforcing this policy, i.e. file bugs regularly after each mass rebuild, and actually retire packages, packages that FTBFS will go away after approximately 7½ months.

On the rel-eng side, I'd like to propose the following requirements:
- file FTBFS bugs for all packages which fail in each mass rebuild
- attach build logs to those bugs (root.log, build.log, mock.log)
- exclude any packages from retirement where the FTBFS bug is not in state NEW and there has been a change in the bug state in the last month. (This gives maintainers a way to automatically postpone the retirement.)
- actually do the retirements around 6 weeks after the FTBFS
- if packages are retired, close the FTBFS bugs with an explanation as CLOSED/WONTFIX.

If wanted, we could add the retirement process to the schedule.


My gut tells me this isn't aggressive enough, but my brain tells me that perfect is the enemy of good.

+1

I'd make one amendment: Any bug that is not updated to ASSIGNED/foo six weeks after the first failed mass rebuild will automatically be flagged for the non-responsive maintainer process. That way, we have a full Fedora cycle to find a new maintainer to pick it up before the auto-retirement occurs.

Also, starting the non-responsive maintainer process can light a fire under the maintainer if they are in fact still around.

It that's their only package, that'd make sense. But if they have more than one package, then I'm not sure.

For point #3, I wonder if we should only retire the package on Rawhide and not on the branched release? I could see arguments either way, but I would worry it could be disruptive to the branched release if we did it that late in the cycle. What do others think?

i'm +1 to @sgallagh's suggestion, though I think we should perhaps take it case-by-case in light of @zbyszek's comment.

If we only do master, not the newly-branched version, the time from initial detection of the problem until package retirement will be ~13½ months. That seems way too much.

So maybe we should be more aggressive instead, and retire packages two weeks after the second mass rebuild failure? This is still ~6½ months from the initial bug report, but would give the rest of the distro more time to handle the fallout.

It that's their only package, that'd make sense. But if they have more than one package, then I'm not sure.

I don't see this as being any different than the cases where a non-responsive maintainer issue is forwarded to us manually. We get plenty of those where the packager isn't maintaining one package but has others. The only difference is that this case could be automated.

The primary difference would be in outcome; we might remove the maintainer from just this package, rather than all of their packages.

We get plenty of those where the packager isn't maintaining one package but has others.

OK. I just don't want us to automatically remove all the packages from the maintainer.

@zbyszek I agree - 13 months is too long. I'd +1 to 2 weeks after the second mass rebuild - after all, they already had ~6 months and didn't fix it in that time, so 2 more weeks is a very reasonable last chance to fix it imo. I think this would address my concern too.

I'd make one amendment: Any bug that is not updated to ASSIGNED/foo six weeks after the first failed mass rebuild will automatically be flagged for the non-responsive maintainer process. That way, we have a full Fedora cycle to find a new maintainer to pick it up before the auto-retirement occurs.

The conclusion is not quite right, as soon as the process is fixed, orphaned packages will be retired again when they are orphaned for six or more weeks. So the non-responsive maintainer process will maybe extend it for a total of about 10 weeks (4 weeks for non-responsive maintainer process + 6 weeks for package being orphaned).

i'm +1 to @sgallagh's suggestion, though I think we should perhaps take it case-by-case in light of @zbyszek's comment.

If we want to do it on a case-by-case basis we need someone to to the work, since it cannot be fully automated anymore.

Since this is also releng's domain, let's let @mohanboddu know about this ticket :-)

Another thing to consider for the policy is that we might not have a mass-rebuild on each release. Do we want to mandate that there should be one for every release to make sure we capture FTBFS packages?

Also whenever we retire packages, we might create packages with broken dependencies. There was a policy to clean then up in the past, too, but it was also not followed. What will we do with these packages?

Another proposal I have is instead of retiring packages is to orphan them and fix the long-time orphan packages process. Then we do not have too many processes in parallel that will retire packages, but just get broken packages be orphaned and then let the other process clean up. This would also incorporate the idea to start the non-responsive maintainer process.

About closing bugs for retiring packages: Currently there is no process to close bugs for retired packages afaik. IMHO it would be good to make this a separate process for all retired packages instead of treating the FTBFS bugs special.

Also I would like to change the process to use just one FTBFS tracker bug and file only up to one FTBFS bug per package and update it accordingly. IMHO having multiple bugs make things just more complicated/less readible. If the reason for separate trackers is to keep the amount of bugs blocking the tracker low I suggest to remove closed bugs from the tracker instead.

Another thing to consider for the policy is that we might not have a mass-rebuild on each release. Do we want to mandate that there should be one for every release to make sure we capture FTBFS packages?

I'd be against forcing a mass rebuild just for that. If there is no mass rebuild, it means gcc and glibc didn't change, so the chances of new ftbfs failures are smaller. OTOH, packages which were flagged in the previous mass rebuild should still be retired. I'll amend the proposal to include that.

Also whenever we retire packages, we might create packages with broken dependencies. There was a policy to clean then up in the past, too, but it was also not followed. What will we do with these packages?

That's why we retire early after the mass rebuild. This gives people time to work on dependent packages if the ftbfs cannot be fixed.

Another proposal I have is instead of retiring packages is to orphan them and fix the long-time orphan packages process.

Orphaning has the disadvantage that the package is still there. Dependent packages can keep using the ftbfs package. IIUC, we'd first orphan the package, and then wait a few weeks, and then retire it if no maintainer appears. This extra step would increase the time until the package is retired significantly.

Metadata Update from @zbyszek:
- Issue assigned to zbyszek

5 years ago

An updated policy proposal, taking feedback into account (changes in bold):

  1. Add a note that after each mass rebuild FTBFS bugs will be created for packages that failed.

  2. Remove all the outdated mentions of Matt Domsch and monthly rebuilds, BugZappers, CVS, PackageDB, etc.

  3. Replace "Packages with unresolved FTBFS bugs at Feature Freeze for the N+2 release will be removed" with "Packages that failed to build in two consecutive mass rebuilds will be removed 2 weeks after the second mass rebuild. If there is no mass rebuild in a given release, the time to removal is still running, and the removal will happen a week before branching.

  4. Introduce a rule that changing the FTBFS bug status to ASSIGNED/anything-else-that-is-not-NEW means that the bug is being worked on.

  5. The non-responsive maintainer policy will be started 6 weeks after a ftbfs bug is opened, unless the bug is changed as in 4. above.

I we start enforcing this policy, i.e. file bugs regularly after each mass rebuild, and actually retire packages, packages that FTBFS will go away after approximately months.

On the rel-eng side, I'd like to propose the following requirements:
- file FTBFS bugs for all packages which newly fail in each mass rebuild
- for packages which already have an open bug, update the existing bug with new information
- attach build logs to those bugs (root.log, build.log, mock.log)
- exclude any packages from retirement where the FTBFS bug is not in state NEW and there has been a change in the bug state in the last month. (This gives maintainers a way to automatically postpone the retirement.)
- actually do the retirements around 2 weeks after the second FTBFS
- if packages are retired, close the FTBFS bugs with an explanation as CLOSED/WONTFIX.

Metadata Update from @zbyszek:
- Issue tagged with: meeting

5 years ago

Introduce a rule that changing the FTBFS bug status to ASSIGNED/anything-else-that-is-not-NEW means that the bug is being worked on.

I'd like to have a protection from somebody who just flips this to ASSIGNED and then does nothing.

Introduce a rule that changing the FTBFS bug status to ASSIGNED/anything-else-that-is-not-NEW means that the bug is being worked on.

I'd like to have a protection from somebody who just flips this to ASSIGNED and then does nothing.

I would like to believe that this would be an exceptional case and that we can deal with it on an individual basis, should the need arise.

For the record, I'm +1 to this proposal as written.

An updated policy proposal, taking feedback into account (changes in bold):

Add a note that after each mass rebuild FTBFS bugs will be created for packages that failed.

Remove all the outdated mentions of Matt Domsch and monthly rebuilds, BugZappers, CVS, PackageDB, etc.

Replace "Packages with unresolved FTBFS bugs at Feature Freeze for the N+2 release will be removed" with "Packages that failed to build in two consecutive mass rebuilds will be removed 2 weeks after the second mass rebuild. If there is no mass rebuild in a given release, the time to removal is still running, and the removal will happen a week before branching.

How about: All packages that failed to build from source for six months will be retired one week before branching.

This simplifies it a lot and should have the same effect.

Introduce a rule that changing the FTBFS bug status to ASSIGNED/anything-else-that-is-not-NEW means that the bug is being worked on.

If a bug is ASSIGNED and a new mass rebuild happens, will we reset it to NEW?

The non-responsive maintainer policy will be started 6 weeks after a ftbfs bug is opened, unless the bug is changed as in 4. above.

IMHO the non-responsive maintainer procedure is wrong here, since we do not want to orphan all of the maintainers packages (the non-responsive maintainer procedure results in all packages of the maintainer being orphaned). I would change this.

If a FTBFS bug is in NEW state for 6 weeks, a weekly reminder will be added that the package will be orphaned when the bug is in NEW state for 8 weeks. if the FTBFS bug is in NEW state for 8 weeks, the package will be orphaned.

Note: orphaned packages should get retired after about six weeks

If a FTBFS bug is still open after six months four weeks before branching, it gets weekly reminders about the package being retired one week before branching.

I guess this is what you mean to happen.

I we start enforcing this policy, i.e. file bugs regularly after each mass rebuild, and actually retire packages, packages that FTBFS will go away after approximately 6½ months.

Ok, now I am lost. How do you get to 6.5 months?

IMHO we should first decide about a goal how fast we want what to happen with FTBFS packages and then figure out the details how to implement this technically. I am not sure if all the goals here align with other restrictions such as how much time is there between mass rebuilds and branching.

This was discussed in the FESCo meeting on 2018-05-18:
AGREED: The proposal from https://pagure.io/fesco/issue/1890#comment-512632 with changes in https://pagure.io/fesco/issue/1890#comment-512813 is approved. We'll hammer out any details later (+6, 0, 0)

I updated the policy in https://fedoraproject.org/wiki/Fails_to_build_from_source. I think I included all the requested corrections, but please double-check. If there is consensus, I'll also post a note to devel-announce@.

Minor suggestions for clarification:

  1. "If the bug is in your package" should be "If the build of your package fails due to a bug in your package"
  2. "If the bug is in a different package" should be "If the build of your package fails due to a bug in another package (such as a compiler bug or missing dependency)"

The way it reads right now could be confusingly interpreted as "in case a FTBFS bug is assigned to you for someone else's package". I realize that a closer read would resolve the confusion, but we can probably make that simpler on people.

I might also extend "If the package should be retired," to be "If the package is no longer useful to the Fedora project, it should be retired,".

We might add one more bullet point reminding people that if they have no time for this package, they should voluntarily orphan it immediately, rather than waiting for the automated process to do it in eight weeks (thus reducing the amount of time available to whomever takes it up).

Lastly, I think "In all cases, if you close an FTBFS bug as a duplicate of another bug, please make the other bug to block the right FTBFS tracking bugs." should be replaced by "Do not close an FTBFS bug as a duplicate, because this makes tracking difficult and may result in a new FTBFS bug being created. Instead, mark the other bug as blocking the FTBFS bug."

Thanks for the review and comments.

"If the bug is in your package" should be "If the build of your package fails due to a bug in your package"

Done.

"If the bug is in a different package" should be "If the build of your package fails due to a bug in another package (such as a compiler bug or missing dependency)"

Done.

Extend "If the package should be retired," to be "If the package is no longer useful to the Fedora project, it should be retired,"

Done.

We might add one more bullet point reminding people that if they have no time for this package, they should voluntarily orphan it immediately, rather than waiting for the automated process

Sure, but OTOH, that's always true, and I don't want to make this list longer than it has to be. If you have a succinct formulation, just add it to the page.

Lastly, I think "In all cases, if you close an FTBFS bug as a duplicate of another bug, please make the other bug to block the right FTBFS tracking bugs." should be replaced by "Do not close an FTBFS bug as a duplicate, because this makes tracking difficult and may result in a new FTBFS bug being created. Instead, mark the other bug as blocking the FTBFS bug."

I think that it's sometimes still to mark as DUPLICATE: when a FTBFS bug was already created, and a new one is made automatically. I think in that case it's better to keep using the original one, and close the new one. I put "Only one FTBFS bug should be open at any time against a given package.", and I think it's better to generally keep the amount of open bugs down to the necessary minimum.

(It's not clear how to identify all the FTBFS bugs. Does "FTBFS" have to appear in the title? Is it enough to block the tracking bug? This is important, but I'd leave it out for now and ask releng to document the proper way.)

What about this text for devel-announce@:

"""
Fedora package maintainers,

FESCo approved an updated policy for packages which fail to build from
source during mass rebuilds (FTBFS) [1].

The updated policy is still at https://fedoraproject.org/wiki/Fails_to_build_from_source.

Highlights:

  • packages which FTBFS are subject to orphaning if there is no
    maintainer acknowledgement within 8 weeks

  • packages which FTBFS in two consecutive mass rebuilds will be
    retired soon after the second mass rebuild

The implementation of this policy hinges on improving the releng
scripts used to create and manage FTBFS bugs. There is approximately
two months until the next use of those scripts, so I'm hopeful we'll
get them working.

If your package wasn't successfully built for F28, please fix that!

[1] https://pagure.io/fesco/issue/1890
[2] https://pagure.io/fesco/issue/1877#comment-509161
"""

The announcement text looks fine to me. +1

This was discussed in the FESCo meeting yesterday, and the text above was accepted.

The announcement:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/P3KFTJMDNO42POS5N3Z4UXDNPFGAQH73/

Metadata Update from @zbyszek:
- Issue untagged with: meeting
- Issue status updated to: Closed (was: Open)

5 years ago

I rerun my script to close bugs. This time, unlike previously, I also closed any bugs in state NEW or ASSIGNED where there was a successful build in F28. My assumption is that people might not want to release updates this late after a release, for example if the fix involved a version bump, and even if they should, the FTBFS is not the place to track that. I also closed, like previously, any bugs in state NEW or ASSIGNED for which an update exists, but wasn't marked in bodhi for this bug.

This brings down the list of open bugs from 764 to 643.
So the NEW/ASSIGNED list is now cleaned up and only includes packages that actually were not built.

The question is how to implement the rest of the policy.
My proposal is to do the following:

  • use the bugzilla mass bug dialogue to
    • set the NEEDINFO flag for bug assignees
    • add a comment (proposed text below).
  • after 8 weeks, the orphaning can be performed for those packages for which are still on the NEW list.

The mass rebuild kicks in next week, new FTBFS bugs will be opened.

  • after the mass rebuild is done and any immediate fallout has been fixed, this procedure can be repeated.

This way, the NEEDINFO flag will serve to deliver the weekly notices to maintainers.

proposed text:
Dear Maintainer,
your package has not been built successfully in F28. Action is required from you.
If you can fix your package to build, perform a build in koji, and either create an update
in bodhi, or close this bug without creating an update, if updating is not appropriate.
If you are working on a fix, set the status to ASSIGNED to acknowledge this.
Following the latest policy for such packages [1], your package will be orphaned
if this bug remains in NEW state more than 8 weeks.

If nobody objects, I'll do this.

Don't forget to add the [1] link to the Bugzilla comment.

Text:

Dear Maintainer,

your package has not been built successfully in F28. Action is required from you.

If you can fix your package to build, perform a build in koji, and either create
an update in bodhi, or close this bug without creating an update, if updating is
not appropriate [1]. If you are working on a fix, set the status to ASSIGNED to
acknowledge this. Following the latest policy for such packages [2], your package
will be orphaned if this bug remains in NEW state more than 8 weeks.

[1] https://fedoraproject.org/wiki/Updates_Policy
[2] https://fedoraproject.org/wiki/Fails_to_build_from_source#Package_Removal_for_Long-standing_FTBFS_bugs

Bugzilla query with the bug list:
https://bugzilla.redhat.com/buglist.cgi?bug_idbug_id_type=anyexact&bug_status=NEW&list_id=9089812&query_format=advanced

Sample comment:
https://bugzilla.redhat.com/show_bug.cgi?id=1583392#c4

Pfff, I screwed up. The bugzilla query to update the bugs timed out, and I thought I can use the severity field (which I changed to "high" in the query), to filter only those bugs which didn't get updated. But it turns out they were being updated in the background or something. In the end some bugs have the needinfo flag and comment more than one time. If you are affected by this, sorry for the noise.

Login to comment on this ticket.

Metadata