#11358 Broken rawhide cloud base image
Closed: Fixed 10 months ago by cverna. Opened 10 months ago by mattia.

I've noticed that bodhi tests upstream performed on rawhide have been failing since a few days. It looks like the Fedora Rawhide base image have a broken dnf config:

Step 1/12 : FROM fedora:rawhide
rawhide: Pulling from library/fedora
adc0d05c6919: Pulling fs layer
adc0d05c6919: Verifying Checksum
adc0d05c6919: Download complete
adc0d05c6919: Pull complete
Digest: sha256:ba63e781111a996fbb5ffe044a804c83984178802ad1a9b763b074d48dbb49a6
Status: Downloaded newer image for fedora:rawhide
 ---> b9288358f6f5
Step 2/12 : LABEL maintainer="Mattia Verga <mattia.verga@fedoraproject.org>"
 ---> Running in 811234490c96
Removing intermediate container 811234490c96
 ---> dc47136d9662
Step 3/12 : RUN dnf install -y     createrepo_c     fedora-messaging     findutils     git     krb5-devel     make     pip     poetry     python-unversioned-command     python3-alembic     python3-arrow     python3-authlib     python3-backoff     python3-bleach     python3-bugzilla     python3-celery     python3-click     python3-colander     python3-cornice     python3-createrepo_c     python3-devel     python3-diff-cover     python3-dnf     python3-dogpile-cache     python3-feedgen     python3-gssapi     python3-jinja2     python3-koji     python3-libcomps     python3-librepo     python3-markdown     python3-munch     python3-openid     python3-psycopg2     python3-prometheus_client     python3-pylibravatar     python3-pymediawiki     python3-pyramid     python3-pyramid-fas-openid     python3-pyramid-mako     python3-pytest     python3-pytest-cov     python3-pytest-mock     python3-requests-kerberos     python3-responses     python3-sphinx     python3-sqlalchemy     python3-sqlalchemy_schemadisplay     python3-waitress     python3-webtest     python3-wheel     python3-yaml     rpm-build     rpmdevtools     skopeo
 ---> Running in 6870143ef76b
Config error: Parsing file "/etc/dnf/dnf.conf" failed: Parsing file '/etc/dnf/dnf.conf' failed: IniParser: Missing section header at line 1

Looking at https://kojipkgs.fedoraproject.org/compose/rawhide/ I see that latest-fedora-Rawhide is stuck at 2023-05-30, while there are newer composes available.


right, but still - we untagged dnf5-5.0.12-1.fc39 on May 26 and dnf-4.16.0-1.fc39 on May 28, so why does the compose from May 30 have this bug?

Good question...

On may 30th (last compose that finished), the x86_64 container base failed: https://koji.fedoraproject.org/koji/taskinfo?taskID=101643306
But it worked on the 29th and that should have had the untagged versions.

And... huh, I can't duplicate it here.

[root@7ec3639f9de7 /]# rpm -qa --last | head
gpg-pubkey-18b8e74c-62f2920f                  Mon May 29 10:02:52 2023

and dnf works fine for me in that container.

I also had the issue with docker.io/library/fedora:rawhide@sha256:ba63e781111a996fbb5ffe044a804c83984178802ad1a9b763b074d48dbb49a6 but it works fine with registry.fedoraproject.org/fedora:rawhide@sha256:e78944516d17f5c32ca9a3b071cb1749cf9bab4179edd59f56ee7a06d837c85c

There's also now been 2 days more of successfull composes.

Is this still happening?

There's also now been 2 days more of successfull composes.

Is this still happening?

Yes. Just re-triggered tests on a bodhi PR and, despite github says "Dowloaded newer image", I get the same dnf error:
https://github.com/fedora-infra/bodhi/actions/runs/5167935788/jobs/9319291732

I cannot identify the checksum of the used image, though. It doesn't match any of those from the latest composes.

Metadata Update from @phsmoura:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain, medium-trouble, ops

10 months ago

Maybe it's because fedora:rawhide pulls from docker.io, which is the broken image last updated on 31th May?
https://github.com/docker-library/official-images/issues?q=label%3Alibrary%2Ffedora

Um. So I don't follow this chain entirely, but there seems to be kind of a lot of 'lag' in it. https://github.com/fedora-cloud/docker-brew-fedora seems to be the thing that feeds the Docker library, and as best as I can tell, that runs weekly on Sundays. The docker library itself is updated only...periodically, the last time being on 2023-05-31. So...it seems like the 2023-05-31 update to the Docker library would have pulled in the docker-brew-fedora update from the most recent Sunday, which would be 2023-05-28, which does seem right in the date range where we'd expect to see this problem.

There's been another update to docker-brew-fedora since then - yesterday, 2023-06-04. That shouldn't have the bug. So I think we just need to get docker-library to update from docker-brew-fedora again, but I'm not sure who can kick that off. @cverna ? Your name seems to be on a lot of this stuff. :D

I've not been involved in updating dockerhub, but my understanding is that it's manual. Someone has to file a PR asking for it to be updated and someone from the docker side has to merge that PR before it happens. ;(

But yeah... @cverna or @humaton likely know more.

So another part of the puzzle here is: "things that are using the Docker registry should maybe use quay.io or registry.fedoraproject.org instead". I can look at that for Bodhi in a bit, if @mattia doesn't do it first.

eh, Bodhi seems to have a pretty complex CI setup that uses its own container images and stuff. I think it's ultimately using docker build to build its base images, and they just specify FROM fedora:(release), so ultimately I think we are using Docker Hub as the base here. But I don't wanna dig into changing it. Seems like bodhi could avoid a lot of maintenance effort by just using SoftwareFactory instead, but hey...

edit: okay, well, after a bit more looking it doesn't seem too complicated, I think https://github.com/fedora-infra/bodhi/pull/5369 should make Bodhi use quay.io instead.

Are images in registry.fp.o or quay.io automatically updated? It doesn't seem so: registry.fp.o says the latest fedora:rawhide image is from 30 Apr, 2023. Quay.io image was updated 19 hours ago, but history shows it is updated once per month.

So, we have the choice to fix this ticket at several levels:
- fix just bodhi CI by switch the base image to quay.io
- fix rawhide images for all (there were other users reporting this error on devel@) for this specific case by manually run the update on docker hub
- have rawhide base images updated automatically at each successful compose

I have opened a PR to update the DockerHub images https://github.com/docker-library/official-images/pull/14789

It usually takes between a day or two for the Docker folks to merge it.

Are images in registry.fp.o or quay.io automatically updated? It doesn't seem so: registry.fp.o says the latest fedora:rawhide image is from 30 Apr, 2023. Quay.io image was updated 19 hours ago, but history shows it is updated once per month.

The rawhide image is updated nightly on registry.fp.o and quay.io, we bounced at the idea to also update the other releases (ie 37, 38) nightly.

So, we have the choice to fix this ticket at several levels:
- fix just bodhi CI by switch the base image to quay.io
- fix rawhide images for all (there were other users reporting this error on devel@) for this specific case by manually run the update on docker hub
- have rawhide base images updated automatically at each successful compose

We can't update daily on DockerHub since it is a manual process and Docker folks needs to review the PRs. An idea was to try to do it weekly but i honestly aims more at doing it monthly now.

More context here https://github.com/docker-library/official-images/issues/7529

@mattia my thinking is we should do 1 and 2 in your list. Obviously we want to fix the docker registry images, but it's still a good idea to also switch everything we can switch to use registries we can update more easily and quickly to avoid future pain.

Good, I've merged @adamwill PR on Bodhi upstream to switch images to quay.io and thanks to @cverna docker hub images will be updated soon.
Thanks all.

The DockerHub upstream PR was merged and docker.io/fedora:rawhide has been fixed.

Closing this :-)

Metadata Update from @cverna:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

10 months ago

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog