#12081 Extend sync-latest-container-base-image.sh to publish native OCI atomic desktop containers, get rid of sync-ostree-base-containers.sh
Closed: It's all good a month ago by adamwill. Opened 2 months ago by adamwill.

In discussion with @kevin , @siosm and @pbrobinson we agreed it seems to make sense to extend the sync-latest-container-base-image.sh script to also find and publish the atomic desktop container images produced by the compose, and get rid of sync-ostree-base-containers.sh . This should reduce complexity and potential for inconsistencies. This is intended to be a small short-term improvement; in the medium term we hope to replace both scripts with a more configurable and extensible Python tool that implements the generic job of "publishing container images from a compose to one or more registries". There will be a separate ticket for that.

  • When do you need this? (YYYY/MM/DD)
    ASAP, but it's not urgently required.

  • When is this no longer needed or useful? (YYYY/MM/DD)
    When/if we replace this whole 'shell scripts run as part of the compose process' approach with something more substantial (the planned Python tool, or extending Bodhi's capabilities, or something else).

  • If we cannot complete your request, what is the impact?
    We'll continue to publish atomic desktop container images produced on-the-fly by a janky bash script (assuming it works), instead of the ones produced by the compose itself. There will likely continue to be issues caused by the scripts going out of sync or not being triggered at the correct times.

Assigning to @pbrobinson for now as he said he could take a look at this while working on https://pagure.io/fedora-iot/issue/58 .


See https://pagure.io/releng/issue/11047 for historical reference - that was the ticket that led to sync-ostree-base-containers.sh being created. tagging @walters for info. The thing that changed since that ticket and the writing of the script is that we got https://pagure.io/pungi/pull-request/1699 , the new phase in Pungi that produces ostree native container images as part of the compose. @walters did say in https://pagure.io/pungi/pull-request/1699#comment-195698 "we'll extend that [these shell scripts] after this lands...", so I guess this ticket is for that extension. :D

https://pagure.io/releng/issue/12082 is the ticket for the medium-term "write something better than shell scripts" task.

well, my suggestion was to entirely drop the sync-ostree-base-containers script and merge the atomic desktop container publishing into sync-latest-container-base-image. In theory at least, now the compose produces ociarchive images for the atomic desktops just like it does for the 'normal' container images, the same script should work fine for both, with a few extensions to the logic to find the appropriate koji tasks.

did you find some reason the sync-latest-container-base-image script isn't capable of handling the atomic desktop images properly, or something? is there a reason to keep the separate script?

OK, this is me not understanding the Fedora infra and what get's run where. I'll try updating the other script.

Metadata Update from @phsmoura:
- Issue tagged with: medium-gain, medium-trouble, ops

2 months ago

both scripts are run at the end of nightly.sh, which runs the nightly composes of Rawhide and Branched (when it exists) - https://pagure.io/pungi-fedora/blob/11e6a131fd87c91694c4c4546ab99387c03d8fe0/f/nightly.sh#_220 . if we combine them we should update that to only run one, of course. However, that's just the easy part. The complicated part is stable releases...

There is a nightly compose of container images for stable releases called "Fedora-Container" - https://kojipkgs.fedoraproject.org/compose/container/ - which is driven by the container-nightly.sh script in pungi-fedora. The script from the appropriate release branch is run for each release - so we run the version of the script from the f39 branch for Fedora-Container 39 composes and the version from the f40 branch for Fedora-Container 40 composes. This compose only builds the 'regular' container images, it does not do anything with atomic desktops at all.

Currently, the f39 version of the script runs sync-latest-container-base-image at the end, so it publishes the images it just built; the f40 version of the script runs sync-ostree-base-containers at the end, so it doesn't publish the images it just built at all (we haven't updated the f40 registries since we stopped doing the pre-release nightly composes...), it instead converts whatever the latest atomic desktop ostrees for f40 happen to be to OCIs and publishes those.

It's Bodhi that actually builds new atomic desktop content for stable releases. It does this by running a Pungi compose - the composes can be found under https://kojipkgs.fedoraproject.org/compose/updates/ . The config for that compose is templated from infra ansible - https://pagure.io/fedora-infra/ansible/blob/main/f/roles/bodhi2/backend/templates/pungi.rpm.conf.j2 - and does not have an ostree_container section, so those composes don't produce native OCIs. The Bodhi compose also does not produce 'regular' container images at all. Bodhi does not run any sync script after it does its compose.

So...stable releases are a bit of a mess, and that complicates this. I guess in an ideal world there would be just one compose that produced both 'regular' and 'atomic desktop' containers for stable releases, and it would produce OCIs for the desktops, and it would run a single sync script when it was done. But that's not the world we live in.

For now I have sent PRs to make both f39 and f40 branches of the container-nightly.sh script run both sync scripts, as that seems like the most practical thing to do. Even though it's weird that the script that drives the Fedora-Container compose should also do the work of converting whatever ostrees Bodhi happened to build most recently to OCIs and publishing them, nothing else does that job at present, so we may as well keep it doing that (for f40) or have it go back to doing that (for f39). But of course it should also publish the images it just built.

For this ticket, the fact that Bodhi doesn't produce native OCIs yet makes it more difficult to get rid of the sync-ostree-base-containers script, at least if we do care about having up to date atomic desktop OCIs in the registries for stable releases.

I suppose the next step might be to extend the Pungi config Bodhi uses to also have an ostree_container phase, so we get native OCIs from those Bodhi composes. Then we would at least have the native OCIs in all cases and not have to worry about the ostree->OCI conversion stuff any more, and then we can go ahead and drop the sync-ostree-base-containers script as I intended.

Metadata Update from @adamwill:
- Issue untagged with: medium-gain, medium-trouble, ops

2 months ago

Metadata Update from @adamwill:
- Issue tagged with: medium-gain, medium-trouble, ops

2 months ago

https://pagure.io/fedora-infra/ansible/pull-request/1988 should make Bodhi updates composes include atomic desktop OCIs.

It's Bodhi that actually builds new atomic desktop content for stable releases. It does this by running a Pungi compose - the composes can be found under https://kojipkgs.fedoraproject.org/compose/updates/ . The config for that compose is templated from infra ansible - https://pagure.io/fedora-infra/ansible/blob/main/f/roles/bodhi2/backend/templates/pungi.rpm.conf.j2 - and does not have an ostree_container section, so those composes don't produce native OCIs. The Bodhi compose also does not produce 'regular' container images at all. Bodhi does not run any sync script after it does its compose.

It does sync the ostree commits from the ostree/compose to ostree (via the updates sync script that also does rpms).

So...stable releases are a bit of a mess, and that complicates this. I guess in an ideal world there would be just one compose that produced both 'regular' and 'atomic desktop' containers for stable releases, and it would produce OCIs for the desktops, and it would run a single sync script when it was done. But that's not the world we live in.

For now I have sent PRs to make both f39 and f40 branches of the container-nightly.sh script run both sync scripts, as that seems like the most practical thing to do. Even though it's weird that the script that drives the Fedora-Container compose should also do the work of converting whatever ostrees Bodhi happened to build most recently to OCIs and publishing them, nothing else does that job at present, so we may as well keep it doing that (for f40) or have it go back to doing that (for f39). But of course it should also publish the images it just built.

yeah, it's also non ideal because it happens a long while after the bodhi compose, but at least daily is better than nothing.

For this ticket, the fact that Bodhi doesn't produce native OCIs yet makes it more difficult to get rid of the sync-ostree-base-containers script, at least if we do care about having up to date atomic desktop OCIs in the registries for stable releases.

I suppose the next step might be to extend the Pungi config Bodhi uses to also have an ostree_container phase, so we get native OCIs from those Bodhi composes. Then we would at least have the native OCIs in all cases and not have to worry about the ostree->OCI conversion stuff any more, and then we can go ahead and drop the sync-ostree-base-containers script as I intended.

Yeah, I wonder how much time this will add to them, but... oh well.

It does sync the ostree commits from the ostree/compose to ostree (via the updates sync script that also does rpms).

Ah, right - I really meant "Bodhi does not run either of the container sync scripts".

Yeah, I wonder how much time this will add to them, but... oh well.

hopefully not too much time now that stupid bug has been fixed...we can always try it out and roll it back if it takes too long, I guess.

Is the bug fixed in stable releases? I thought it was only in rawhide last i looked. ;(
(but yeah... we can try it)

well, the fix is in upstream 2024.5, that is stable for Rawhide and F40 already, and in u-t for F39 - https://bodhi.fedoraproject.org/updates/FEDORA-2024-2e8d56bf28

OK, so we managed to get this working now, so the latest updates and updates-testing composes built OCI archives, e.g. https://kojipkgs.fedoraproject.org/compose/updates/Fedora-40-updates-testing-20240502.0/compose/Silverblue/x86_64/images/Fedora-Silverblue-40.20240502.0.ociarchive and https://kojipkgs.fedoraproject.org/compose/updates/Fedora-40-updates-20240502.0/compose/Silverblue/x86_64/images/Fedora-Silverblue-40.20240502.0.ociarchive . It looks like this added about half an hour to compose time (38 minutes before, 1hr 6 mins after).

so, if we're happy with that, we can ditch the script conversion at least for F40+ (I set things up so for now at least we're only building OCI archives for F40 updates composes, not F38 or F39, but we could change that if we think it would work OK on F39 I guess).

I think it's OK to start with F40+ only. Thanks a lot, this looks great.

OK, so the next step are to figure out the upload and drop the custom conversion script.

yeah, I'm gonna start working on my python publisher thingy too.

Update: https://pagure.io/cloud-image-uploader/pull-request/10 should implement both sync-latest-container-base-image.sh and sync-ostree-base-containers.sh functionality into the now-wrongly-named "cloud" image uploader, with tests and everything. Of course it doesn't do on-the-fly ostree conversion, for atomic desktops - it just pushes the ociarchive images.

If that looks good to folks, we could potentially merge that, roll it out, and get rid of both scripts for F40 and Rawhide. For F39 we should probably keep running sync-ostree-base-containers.sh for now as we are not producing ociarchive images for F39.

Sounds like a reasonable plan. Is it too late to rename 'cloud-image-uploader' to just 'image-uploader'? but I guess it's not the end of the world.

It's probably not quite too late, but it is quite a lot of change. It will require a lot of search/replace on the infra ansible repo too. I was working on just the project side of that itself on a follow-up branch to the PR.

OK, well, we finished off the container sync code for fedora-"cloud"-image-uploader and merged it - https://pagure.io/cloud-image-uploader/pull-request/10 . So we should probably focus instead on deploying that, rather than doing this. I'll close this ticket and file a new one.

Metadata Update from @adamwill:
- Issue close_status updated to: It's all good
- Issue status updated to: Closed (was: Open)

a month ago

Log in to comment on this ticket.

Metadata
Boards 1
Ops Status: Backlog