Our updates flow is blocked by a flatpak stable push:
https://bodhi.fedoraproject.org/composes/F32F/stable
As far as I understand it, all bodhi does in this case is copy the flatpak from the candidate registry to the normal one and update any updates/etc.
The copy is failing and when I run that command that bodhi is running manually I get:
[root@bodhi-backend01 ~][PROD]# time sudo -u apache /usr/bin/bodhi-skopeo-lite copy docker://candidate-registry.fedoraproject.o rg/0ad:master-3220200604091941.1 docker://registry.fedoraproject.org/0ad:master-3220200604091941.1 INFO:skopeo-lite:candidate-registry.fedoraproject.org: Downloading /tmp/tmpwwd7l53_/blobs/sha256/5d42466e4948499efdb1268c6787e6 0d7932c68b97e32f22211b8dd62328567d (size=47611) INFO:skopeo-lite:candidate-registry.fedoraproject.org: Downloading /tmp/tmpwwd7l53_/blobs/sha256/3744f18ab4c680b1164a15b1242dea 36ee304efac61a3881dfae02eff0dcaa38 (size=937468406) INFO:skopeo-lite:registry.fedoraproject.org: Storing manifest as sha256:ba3e5383d9e714d15da8193967bec116d5b2c50f71b487d9b52b2e7 4cf02ae7e Traceback (most recent call last): File "/usr/bin/bodhi-skopeo-lite", line 11, in <module> load_entry_point('bodhi-server==5.1.1', 'console_scripts', 'bodhi-skopeo-lite')() File "/usr/lib/python3.8/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/usr/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/usr/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/usr/lib/python3.8/site-packages/bodhi/server/scripts/skopeo_lite.py", line 779, in copy Copier(tmp, dest.get_endpoint()).copy() File "/usr/lib/python3.8/site-packages/bodhi/server/scripts/skopeo_lite.py", line 737, in copy self._copy_manifest(referenced) File "/usr/lib/python3.8/site-packages/bodhi/server/scripts/skopeo_lite.py", line 725, in _copy_manifest self.dest.write_manifest(info, toplevel=toplevel) File "/usr/lib/python3.8/site-packages/bodhi/server/scripts/skopeo_lite.py", line 656, in write_manifest response.raise_for_status() File "/usr/lib/python3.8/site-packages/requests/models.py", line 940, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url: https://registry.fedoraproject.org/v2/0ad/manifes ts/sha256:ba3e5383d9e714d15da8193967bec116d5b2c50f71b487d9b52b2e74cf02ae7e real 2m22.593s user 0m4.199s sys 0m2.885s
The registry box is fedora32 now in iad2, where it was fedora 30 in phx2. But the version of docker-distributuon seems pretty much the same. :(
@cverna @mohanboddu @otaylor @kalev
Can any of you see whats going on here? We really need to unblock our flow of updates...
I'll note if you go to that url it gets a 503 on, it says:
"OCI manifest found, but accept header does not support OCI manifests"
old registry: docker-distribution-2.6.2-9.git48294d9.fc30.x86_64 new registry: docker-distribution-2.6.2-11.git48294d9.fc32.x86_64
I'll note if you go to that url it gets a 503 on, it says: "OCI manifest found, but accept header does not support OCI manifests" old registry: docker-distribution-2.6.2-9.git48294d9.fc30.x86_64 new registry: docker-distribution-2.6.2-11.git48294d9.fc32.x86_64
That's because you need to send a request with an appropriate Accept header (see https://github.com/fedora-infra/bodhi/blob/ebb886e7392e64fd046fd638c62035e7dd21d956/bodhi/server/scripts/skopeo_lite.py#L333). Doing so from my PC works fine, so maybe there's some DNS problem connecting Bodhi server to the registry box?
Accept
it appears that the problem is possibly that the image is already on the destination (public registry) and skopeo-light doesn't handle that. I'm not sure why a 503 is being generated by the HTTPD proxy rather than passing back the actual status code / error message which is probably more informative.
I don't have much of an idea why the image would already be there - maybe a previous container push failed because of relocation stuff? Worth looking in the logs to see what the first failure on this push was - it might be different.
I probably won't have time to investigate further or come up with a fix for skopeo-lite until Monday. Is it possible to unqueue this one update for 0ad and see what happens with the next?
ok, that was a saga. ;) After many hours it's working and I was able to complete the push.
Along the way:
The 503 errors were due to firewall / proxy issues in the new datacenter. First on candidate registry, then on the final one. Got all those worked around until we can properly fix them next week by routing over our vpn instead of trying to reach those hosts directly.
Then, I saw that the flatpak was already copied over, so I thought: why not delete it from the registry and let bodhi copy it again? First I had to also get oci-registry02 working, as all the deletes and writes go to it instead of 01. After that Ran into a side problem with docker-distribution having delete allowed, but disallowing delete. Finally I realized the config was put in place, but docker-distribution was never restarted. Restarted it and was able to delete that from registry.
Then, on running bodhi it could now not find the content in the candidate registry? Turns out somhow the candidate registry had /srv/oci_registry mounted. IT HAD THE SAME CONTENT AS PROD! I umounted that and found that on our old candidate registry it was just local disk.
So, luckily I was still able to get into our old datacenter and copy all the old content off the old candiate registry on to the new one. With an aside of the instance having too small a disk and I had to resize it to get the old content to fit.
Finally the update push worked as expected. Whew.
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Thanks, @kevin, for figuring this out!
Login to comment on this ticket.