#10779 Duplicating pungi-fedora config for Bodhi in infra ansible makes it outdated and confusing
Closed: Fixed a year ago by zlopez. Opened 2 years ago by kparal.

Describe what you would like us to do:


TLDR: Please don't duplicate pungi-fedora config file in Bodhi's role templates. (Or at least not multilib conf sections).

Recently I've pushed a change to pungi-fedora, because people had big troubles with mesa-vulkan-drivers.i686 missing from the updates-testing repo:
https://pagure.io/pungi-fedora/pull-request/1104

But even with a new update, people still complained. I spent more than an hour in investigation, and I had to poke releng people to finally found out that while Rawhide composes are always using the latest pungi-fedora config from the repo, updates-testing composes are actually done by Bodhi and it uses a separate copy of pungi-fedora, with heavy local modifications. It means the multilib_whitelist= option is duplicated in there, and somebody needs to regularly merge the upstream changes, I assume. Of course, that means that the config file is very often outdated, and it is very hard for people to figure out why and how to request a fix.

Can you please make this config workflow saner? If Rawhide composes can automatically pick up the latest config, why Bodhi can't? We own all the projects, can we adjust the code in such a way that the workflow is automatic?

Or if there are big technical obstacles from reusing the upstream pungi-fedora config file, can they at least both be placed in the same source repo, instead of one there and one in infra ansible? This way, if a person searches for "multilib", he at least finds both config files side by side (one for rawhide, the other for updates/updates-testing).

Or, if even that is problematic, can we at least make the multilib* configuration shared?

Thank you!

Apart from this bigger goal, can you please sync the Bodhi config file with upstream right now (at least the multilib* options), so that people are not affected with constant broken deps for mesa-vulkan-drivers? Thank you!

When do you need this to be done by? (YYYY/MM/DD)


The upstream sync and deployment would be nice to have in a few days. No target date for improving the workflow.


as a quick fix, I have synced and deployed multilib_whitelist.

It's rather confusing to see what's going on here as there seem to be two levels of 'templating' going on in the files. AFAICT, {% conditionals and {{ values are handled by ansible, while [% conditionals and [[ values are being handled by Bodhi.

So there's kinda two things going on here. We're using ansible templating to deploy slightly different versions of these files on prod and stg Bodhi: there are 4 ansible conditionals and 7 ansible variable subs in the file, all dealing with the prod vs. stg difference. Then Bodhi is using some templating to use these single config files to handle doing composes for multiple releases and branches: the [% conditionals and [[ substitutions do a lot of referring to things which are clearly Bodhi db objects, like release.

Some of the Bodhi-level templating is just subbing in release numbers and stuff. But some of it is handling the difference between 'updates' and 'updates-testing'. Some of it is handling difference between Fedora and EPEL update composes.

Overall this looks a bit awkward to 'fix' unfortunately :( We could carry more config files in the git repo - two additional files per non-Rawhide release, one for updates and one for updates-testing - but I'm not sure that would be better? And we'd still have to figure out how to handle staging vs. prod at the ansible level.

Metadata Update from @phsmoura:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: low-gain, low-trouble, ops

2 years ago

Yeah, I noticed some of the variables (not all). That's why I proposed to at least share the multilib config options (and anything else that makes sense to have unified across all releases), if a better solution can't be found. Alternatively, as a totally lowest-effort solution, we could at minimum restructure the config files into a specific part and a general part, and add a visible comment pointing people to the other places, if they want to edit some bits in the general section.

Config divergence is a real problem at least for the multilib_whitelist option. Not only mesa-vdpau-drivers was missing (because I had no idea there's another place to edit), but also vkBasalt (added a year ago in pungi-fedora). And on the other hand, the latest update from @humaton purged pipewire and iptables, because they were not included in pungi-fedora. However, when looking at this PR and the linked bug report, it seems they should be here and now we caused a regression. It also shows, that even the most knowledgeable infra people like @kevin still forget to update multilib_whitelist in all places, because there's no such commit in pungi-fedora. The latest update also changed wine* to wine, because that's how it was written in pungi-fedora. But it might not be correct and can cause a regression, we should look into it.

As you can see, diverging configs, at least for those multilib options, is causing real problems and a wasted human time. I'd like to fix this somehow.

The missing packages are now back in the config and deployed.

I've created these two PRs when trying to sync changes between pungi-fedora and bodhi infra config, and decide which side is better. I only updated multilib_whitelist, because I'm not completely sure what multilib_blacklist does and how to handle it.

https://pagure.io/pungi-fedora/pull-request/1108
https://pagure.io/fedora-infra/ansible/pull-request/1121

Yeah, I was gonna suggest that while it doesn't look very possible (or, at least, easy) to usefully "share" the whole configuration between release composes and Bodhi update composes, we could probably make Bodhi and/or Pungi accept systemd-style snippet-y, three-level-overrideable config files; that would give us a lot of flexibility for sharing bits that can be shared, like this allow/denylist.

If the config system supports inheritance or if we can easily implement it, that would be ideal. But even just moving the current bodhi config file as it is (full of jinja etc variables) into pungi-fedora (e.g. as fedora-bodhi.conf) and pulling it from there instead of storing it directly inside ansible would make a difference.

Could I ask to slow down a bit here... :)

These are two different composes with different needs. I'm in favor of sharing things to avoid duplication, but I don't think we can merge then, nor do a think it's a good idea to move the bodhi config to another repo.

But... give me some time to look at this? I have a bunch of fires to put out after a long weekend, so can you let me get caught up before you do a bunch of radical changes that break things?
Thanks.

So, sorry for my grumpy comment above. :) Tuesday after a long weekend and just a bunch of things were pulling me in several ways.

Anyhow, so, IMHO in some kind of ideal world here all these whitelist items would be rolled up into python-multilib. That would make it a tad bit more anoying as you would have to get a change there, get a new package release, update bodhi-backend01 and compose-rawhide01 in order to get a change all the way through. However it would make the config a lot easier and it would be using the same place for that config. That said, the last time I tried to upstream there, the project seems to not be very active or interested in updating for these changes. :(

Oh, and a comment in both places explaining all this would be great. To warn us not to get them out of sync and also to explain where the other one is?

I updated the PRs as requested.

But, only now I found out that in pungi-fedora there's not only fedora.conf, but also fedora-cloud.conf and fedora-container.conf, all of which have the same multilib_blacklist/multilib_whitelist sections (outdated, of course), so we probably don't need to update it in two places, but in four? And should sync them now? Sigh.

However, there seems to be some light at the end of the tunnel. See Importing other files section in the Pungi config documentation:
https://docs.pagure.org/pungi/format.html#importing-other-files

It seems it should be possible to create a single multilib.conf in pungi-fedora root, and then do use the following line in all fedora*.conf files:

from multilib import *

(Perhaps even more option than just multilib options can be shared, so the file could be called general.conf and contain everything shared).

This would at least make it unified in pungi-fedora. (It could also possibly allow the Bodhi ansible role to dynamically download the shared config and load it. Or have pungi-fedora git checked out on Bodhi system, periodically pull it and reference the config file from there. Or similar.)

However, experimenting with pungi a bit out of my comfort zone, would someone be willing to try this?

Sure, can as time permits... it sounds like that would all work.

Perhaps @humaton would be willing to work on it?

Metadata Update from @zlopez:
- Issue untagged with: low-trouble
- Issue tagged with: medium-trouble

2 years ago

[backlog refinement]
Still something we want to do, but didn't found time yet.

Metadata Update from @zlopez:
- Issue assigned to zlopez

a year ago

I found some spare time and created a PR for general config file in pungi.

What about the Bodhi part? I see that the Bodhi is now deployed completely differently and I didn't found any pungi config in bodhi role.

bodhi is in the bodhi2/backend role. roles/bodhi2/backend/

So the openshift role is just for frontend?

PR in pungi is now merged.

The bodhi PR is also now merged.

It's still not ideal, but it's better at least. ;)

Metadata Update from @kevin:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

a year ago

As the Bodhi PR actually broke the composes on production the change was reverted and there is a PR on Bodhi upstream to fix the issue.

Metadata Update from @zlopez:
- Issue status updated to: Open (was: Closed)

a year ago

With the upstream change merged and the new Bodhi version release I opened a new PR for Bodhi.

This seems to be breaking stuff in production. We have messages from Pungi with garbage compose IDs, like this one - note various properties have unexpanded templates in them (e.g. [[ release.id_prefix.title() ]]) that are associated with this change. Please fix/revert this urgently, it will break all sorts of stuff.

Specifically, I'm talking about the changes to roles/bodhi2/backend/files/pungi_general.conf in ansible by @zlopez .

It looks to me like those changes moved the invocations of [[ release.id_prefix.title() ]] from various files that are actually templates where they would have been substituted (e.g. roles/bodhi2/backend/templates/pungi.rpm.conf.j2) to roles/bodhi2/backend/files/pungi_general.conf, which is just a file that gets installed, not a template. That probably needs transforming into a template. Possibly? Actually it's not clear what level this is expected to happen at (ansible or bodhi/pungi).

Its friday after 5pm, so I just reverted this for now. ;)

We can sort it out next week and I recommend we push some test composes during the day to debug it. ;)

It looks to me like those changes moved the invocations of [[ release.id_prefix.title() ]] from various files that are actually templates where they would have been substituted (e.g. roles/bodhi2/backend/templates/pungi.rpm.conf.j2) to roles/bodhi2/backend/files/pungi_general.conf, which is just a file that gets installed, not a template. That probably needs transforming into a template. Possibly? Actually it's not clear what level this is expected to happen at (ansible or bodhi/pungi).

That is definitely my fault and it should be a template or I can move it back to corresponding configs pungi.rpm.conf.j2. I need to check it again and do the adjustments. Sorry for the problems this caused. At least we know that the change in the bodhi worked as intended.

The changes are now in place, this time I hope for the best.

I noticed that the change was reverted by @kevin again :/. Do you have the error that was caused this time?

EDIT: Nevermind, I see it in the #releng channel releng opened a new ticket releng/failed-composes#2329: "[[ release.id_prefix.title() ]]-8-updates-20230504.0 DOOMED". I hoped that moving pungi_general.conf to template will fix that. But maybe I know where I made the mistake.

I moved the template parts back to corresponding configuration files and I was able to finish a compose.

Unfortunately there is a strange error, but I'm not sure it's related to the config change. Here is the traceback:

May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]: [2023-05-04 14:32:14,024: ERROR/ForkPoolWorker-16] Unable to check pungi composed repositories, compose thrown out
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]: Traceback (most recent call last):
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:   File "/usr/lib/python3.11/site-packages/bodhi/server/tasks/composer.py", line 1192, in _sanity_check_repo
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:     for checkfile in os.listdir(os.path.join(checkdir, subdir)):
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]: NotADirectoryError: [Errno 20] Not a directory: '/mnt/koji/compose/updates/Fedora-37-updates-testing-20230504.1/compose/Everything/source/tree/Packages/0ad-0.0.26-3.fc37.src.rpm'
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]: [2023-05-04 14:32:14,039: INFO/ForkPoolWorker-16] Compose object updated.
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]: [2023-05-04 14:32:14,039: ERROR/ForkPoolWorker-16] Exception in ComposerThread(f37-updates-testing)
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]: Traceback (most recent call last):
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:   File "/usr/lib/python3.11/site-packages/bodhi/server/tasks/composer.py", line 410, in work
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:     self._compose_updates()
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:   File "/usr/lib/python3.11/site-packages/bodhi/server/tasks/composer.py", line 964, in _compose_updates
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:     self._sanity_check_repo()
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:   File "/usr/lib/python3.11/site-packages/bodhi/server/tasks/composer.py", line 1192, in _sanity_check_repo
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:     for checkfile in os.listdir(os.path.join(checkdir, subdir)):
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]: NotADirectoryError: [Errno 20] Not a directory: '/mnt/koji/compose/updates/Fedora-37-updates-testing-20230504.1/compose/Everything/source/tree/Packages/0ad-0.0.26-3.fc37.src.rpm'
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]: [2023-05-04 14:32:14,048: INFO/ForkPoolWorker-16] Compose object updated.
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]: [2023-05-04 14:32:14,048: INFO/ForkPoolWorker-16] Thread(f37-updates-testing) finished.  Success: False
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]: [2023-05-04 14:32:14,104: INFO/ForkPoolWorker-16] Compose object updated.
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]: [2023-05-04 14:32:14,104: ERROR/ForkPoolWorker-16] ComposerThread failed. Transaction rolled back.
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]: Traceback (most recent call last):
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:   File "/usr/lib/python3.11/site-packages/bodhi/server/tasks/composer.py", line 339, in run
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:     self.work()
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:   File "/usr/lib/python3.11/site-packages/bodhi/server/tasks/composer.py", line 410, in work
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:     self._compose_updates()
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:   File "/usr/lib/python3.11/site-packages/bodhi/server/tasks/composer.py", line 964, in _compose_updates
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:     self._sanity_check_repo()
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:   File "/usr/lib/python3.11/site-packages/bodhi/server/tasks/composer.py", line 1192, in _sanity_check_repo
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:     for checkfile in os.listdir(os.path.join(checkdir, subdir)):
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 04 14:32:14 bodhi-backend01.iad2.fedoraproject.org celery-3[2694948]: NotADirectoryError: [Errno 20] Not a directory: '/mnt/koji/compose/updates/Fedora-37-updates-testing-20230504.1/compose/Everything/source/tree/Packages/0ad-0.0.26-3.fc37.src.rpm'

This is bodhi's 'sanity check' before syncing out a compose... I don't off hand see whats causing this. ;(

Perhaps @mattia can see something here?

I don't quite know how pungi works, but here they are the relevant lines:
https://github.com/fedora-infra/bodhi/blob/00570ddbef5f0697bbcc3705f14d3f21836379aa/bodhi-server/bodhi/server/tasks/composer.py#L1173-L1204

AFAIK bodhi composer expects to find several directories for each starting RPM letter: self.path/compose/Everything/source/tree/Packages/s/something.src.rpm
instead it finds a single directory with all src.rpm in it:
self.path/compose/Everything/source/tree/Packages/something.src.rpm

So in lines 1191, 1192 the check_sanity_repo tries an extra recursion which returns

NotADirectoryError: [Errno 20] Not a directory: '/mnt/koji/compose/updates/Fedora-37-updates-testing-20230504.1/compose/Everything/source/tree/Packages/0ad-0.0.26-3.fc37.src.rpm'

What's the right path? The final compose result looks like it have just a single big dir, but maybe this is different while the compose process is running? Or maybe this changed some time ago and no one ever fixed bodhi composer?

@mattia I don't understand why the check is passing when we don't split the pungi config and fails otherwise. According to the code it should fail every time.

Looking at the latest compose it seems that there are actually alphabet directories in the Packages dir, which weren't created when we split the config. I need to investigate more why this is happening. Because splitting the config shouldn't have any impact on how the compose is composed.

So I found out what happened and it's my mistake again. When I moved things from pungi_general.conf back to their corresponding template I forgot about hashed_directories = True, which is the configuration value that creates the subdirectories in Packages. I will produce a fix.

So here is the final PR. I started compose manually and it was finished without failure, so I'm closing this ticket, because it's now done.

Metadata Update from @zlopez:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

a year ago

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog
Related Pull Requests
  • #1396 Merged a year ago