#12156 unmaintained bot account: dummy-test-package-gloster gating pipeline tests broken
Opened a month ago by decathorpe. Modified 3 hours ago

It looks like that dummy-test-package-gloster has failed gating tests since June 17, 2023. After that point in time, no updates of this "canary" package got pushed to stable. They seem to all have ended up stuck in "testing" purgatory due to failed gating tests, and were only obsoleted by subsequent builds of the package.

The last stable build from ~a year ago, as can be seen here:

https://bodhi.fedoraproject.org/updates/?search=&packages=dummy-test-package-gloster&status=stable

All builds since were stuck in "testing" and then obsoleted:

https://bodhi.fedoraproject.org/updates/?search=&packages=dummy-test-package-gloster

Should these automated builds either be fixed, or turned off? Who is responsible for the bot that submits these builds? All "bot" accounts are supposed to have a person who looks after them, but if things have literally been broken for a year, that obviously isn't happening here.


AFAICS, this package is actually intentionally rigged to fail gating, and has been since 2020.

I think what's gone wrong is that a bot is supposed to waive the failures, but that isn't happening any more. The last bot-filed waiver appears to be https://waiverdb.fedoraproject.org/api/v1.0/waivers/?subject_identifier=dummy-test-package-gloster-0-9131.fc37 , for https://bodhi.fedoraproject.org/updates/FEDORA-2022-b6216202e8 . Since then a couple of other updates went stable, but only because waivers were filed (presumably manually) by mattia and patrikp.

The bot-filed waivers always used the message "This is fine, we are testing the workflow", so searching for that string is probably the best way to find the bot and start to figure out what's wrong with it.

Took me a bit of poking around, but https://pagure.io/fedora-ci/monitor-gating seems to be the codebase.

Manually running the command waive_update should run - bodhi updates waive FEDORA-2024-92d23d0013 "This is fine, we are testing the workflow" --debug - worked, so it's not that the syntax has gone out of date or anything.

I'll look into it more tomorrow, but at a guess, either this thing is trying to use user/password auth and that doesn't work any more, or its token has gone stale, or for some reason it's not reaching the waive code any more.

Tagging @zlopez and @patrikp , who seem to have touched this thing most recently (other than nirik). Will also poke the CI team on chat.

Metadata Update from @patrikp:
- Issue assigned to patrikp

a month ago

Ah, I do see this from nirik in 2023: https://pagure.io/fedora-infra/ansible/c/39ecc928f0734813733d19175c0964c2f8752cea?branch=main . That might not have worked as expected.

I'm not tagging him ATM because he's on PTO, it'd be best if we can figure it out without bothering him.

Metadata Update from @phsmoura:
- Issue tagged with: medium-gain, medium-trouble, ops

a month ago

Aha. Well, I figured out how to look at the monitor-gating logs, and...as it turns out, I'm pretty sure I broke this!

That changed the Bodhi UpdateReadyForTesting message (bodhi.update.status.testing.koji-build-group.build.complete), reducing the amount of stuff in its artifact dict to just the stuff strictly needed by Fedora CI, but it turns out monitor-gating used it too. So we're failing in utils.lookup_results_datagrepper because we're expecting the artifact dict to have an id key, which it does not any more. That happens before utils.waive_update, so we never do that.

I'll send a PR to fix this (and any other use of the artifact dict I can find in the code).

Metadata Update from @adamwill:
- Issue untagged with: medium-gain, medium-trouble, ops

a month ago

Metadata Update from @adamwill:
- Issue tagged with: medium-gain, medium-trouble, ops

a month ago

Odd. That should be all thats required as far as I know. Did you run the playbook right after merging the PR? There's a small delay before it syncs to batcave01... but it's like 15-20seconds.

I reran the playbook yesterday (just to be sure) and it hasn't fixed the issue, unfortunately.

(Posted this to https://pagure.io/fedora-ci/monitor-gating/pull-request/47#comment-205935 instead of here accidentally :sweat_smile:)

I added some comments in https://pagure.io/fedora-ci/monitor-gating/issue/46 after looking through some of the code. Removing --user and --password seems straight forward but I'm not clear on how bodhi expects it's kerberos information. The code looks like it may be either a configuration file on the host OR it may require a preexisting ticket to forward. I wasn't able to find docs around what was expected but it's highly likely I just missed them. Any pointers would be appreciated 🙏

@pingou Hello. :wave: Would you happen to have any idea about how this could be fixed?

Log in to comment on this ticket.

Metadata
Boards 1
Ops Status: Backlog