We run a CI in VMs running from the us-central1-a zone of GCE. Starting the week of June 28th through now, we've been seeing errors pulling from registry.fedoraproject.org/fedora-minimal:latest whereas (in the same VM and context) we are able to pull from other registries. Example a relevant section of output from our CI.
us-central1-a
Unfortunately that "SHA doesn't match error" is the best we can manage from the job itself. But, I have access to manually spin up one of these VMs to poke and prod whatever lower-level bits would be helpful to debugging.
Could you define what GCE is? [There are several G.* Cloud Environments it could be.] Second could you put a traceroute from there to our proxies?
We run jobs in both containers (GKE) and GCE (Google-compute-engine). I don't recall seeing failures in GKE, only from our GCE project. The failures only seem to happen when we're not looking.
I have a script that lets me create a 99% automation-identical VM, but (of course) I have no problem pulling registry.fedoraproject.org/fedora-minimal:latest from there :confounded: Yes I can get you some traceroute data, and whatever else you need. I'm also trying my darnedest to eliminate our testing|software layers from the equation...
registry.fedoraproject.org/fedora-minimal:latest
# mtr -4 --report --show-ips --report-cycles=50 registry.fedoraproject.org Start: 2019-08-05T12:17:52-0400 HOST: cevich-fedora-30-libpod-547 Loss% Snt Last Avg Best Wrst StDev 1.|-- 209.85.241.122 0.0% 50 10.9 11.0 10.7 12.8 0.5 2.|-- 108.170.243.231 0.0% 50 12.7 12.2 10.4 37.8 4.8 3.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0 4.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0 5.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0 6.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0 7.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0 8.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0 9.|-- et-3-3-0.582.rtsw.rale.ne 0.0% 50 32.1 32.2 31.9 34.5 0.6 10.|-- 198.71.47.222 0.0% 50 32.9 32.8 32.4 34.7 0.4 11.|-- 128.109.25.14 0.0% 50 33.4 33.8 33.2 48.0 2.2 12.|-- 8.43.84.1 0.0% 50 66.0 74.7 53.5 174.4 29.9 13.|-- 8.43.84.3 0.0% 50 33.5 33.7 33.4 38.4 0.7 14.|-- 8.43.84.4 0.0% 50 52.9 51.5 41.4 112.9 11.1 15.|-- ip-8-43-87-254 (8.43.87.2 0.0% 50 156.4 42.2 34.1 156.4 17.5 16.|-- proxy14.fedoraproject.org 0.0% 50 33.6 33.7 33.5 34.7 0.2
# mtr -4 --report --show-ips --report-cycles=50 registry.fedoraproject.org Start: 2019-08-05T12:20:30-0400 HOST: cevich-fedora-30-libpod-547 Loss% Snt Last Avg Best Wrst StDev 1.|-- 216.239.59.150 0.0% 50 10.9 10.8 10.6 11.9 0.2 2.|-- 108.170.244.6 0.0% 50 10.5 10.9 10.4 23.0 1.8 3.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0 4.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0 5.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0 6.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0 7.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0 8.|-- ??? 100.0 50 0.0 0.0 0.0 0.0 0.0 9.|-- et-3-3-0.582.rtsw.rale.ne 0.0% 50 32.0 32.1 31.8 34.2 0.5 10.|-- 198.71.47.222 0.0% 50 32.7 33.1 32.3 50.6 2.5 11.|-- 128.109.25.14 0.0% 50 33.3 33.5 33.2 34.2 0.3 12.|-- 8.43.84.1 0.0% 50 51.6 53.9 38.9 161.4 20.6 13.|-- 8.43.84.3 0.0% 50 33.5 33.4 33.4 33.7 0.1 14.|-- 8.43.84.4 0.0% 50 51.7 54.5 41.9 148.2 20.4 15.|-- ip-8-43-87-254 (8.43.87.2 0.0% 50 92.5 55.2 36.0 157.0 27.2 16.|-- proxy03.fedoraproject.org 0.0% 50 33.7 33.9 33.6 36.6 0.7
# mtr -4 --report --show-ips --report-cycles=50 registry.fedoraproject.org Start: 2019-08-05T12:22:06-0400 HOST: cevich-fedora-30-libpod-547 Loss% Snt Last Avg Best Wrst StDev 1.|-- 209.85.252.47 0.0% 50 25.2 25.5 25.2 33.8 1.2 2.|-- 216.239.48.94 0.0% 50 25.1 25.8 25.1 38.4 2.4 3.|-- 108.170.246.7 0.0% 50 25.4 25.8 25.2 40.6 2.2 4.|-- 198.86.53.238 0.0% 50 31.7 32.2 31.6 43.7 1.7 5.|-- ws-gw-to-rtp-gw.ncren.net 0.0% 50 32.7 32.7 32.4 33.3 0.2 6.|-- uncphillips-to-ws-gw.ncre 0.0% 50 36.8 36.5 36.4 36.9 0.1 7.|-- core-p-v1213.net.unc.edu 0.0% 50 36.7 36.7 36.5 37.0 0.1 8.|-- 152.2.255.166 0.0% 50 36.3 36.3 36.2 36.4 0.0 9.|-- vm18.fedora.ibiblio.org ( 0.0% 50 36.3 36.4 36.2 39.1 0.4
Hmmm I see there are quite a few addresses on both sides here (outbound and destinations). Trying each destination from dns...
...okay, this is a bit easier to look at:
<img alt="mtr.txt" src="/fedora-infrastructure/issue/raw/files/767f6e35a27421b3f6d459746914c5ac7b82bcec2545e2d1521bfafd27787845-mtr.txt" />
So slight loss on proxy13-rdu02.fedoraproje... and proxy10.fedoraproject.org otherwise I'm assuming there's nothing you guys can do for zayo and...dang, seems I should have used --no-dns...
proxy13-rdu02.fedoraproje...
proxy10.fedoraproject.org
zayo
--no-dns
...another run w/o hostnames:
<img alt="more_mtr.txt" src="/fedora-infrastructure/issue/raw/files/555a169be29d960f15907f4cc89c5139ebc82dea1aed4fc35d3dd70cec35eeee-more_mtr.txt" />
This time slight losses on proxy10.fedoraproject.org (209.132.181.15) and again on proxy13-rdu02.fedoraproject.org (209.132.190.2).
proxy13-rdu02.fedoraproject.org
What other data can I provide?
Update: Running mtr again this morning, I no-longer see the 2-4% drops from the registry server ends. Also, I was mistaken with the original example log I linked in the description, that's a totally unrelated/different problem.
The suspected-networking issue is this one, which appears to be more rare. Last occurring at 2019-08-06T00:38:42+00:00
So the error you are posting is about getting to Error determining manifest MIME type for docker://registry.access.redhat.com/fedora-minimal:latest:
That is redhat.com and not fedoraproject.org.
Metadata Update from @smooge: - Issue assigned to smooge
Metadata Update from @smooge: - Issue priority set to: Waiting on Reporter (was: Needs Review)
Oh good catch, that explains the final error. Just prior to that though we see the error from Trying to pull registry.fedoraproject.org/fedora-minimal:latest.
Trying to pull registry.fedoraproject.org/fedora-minimal:latest
However, we have a known problem in podman saving images, the team is addressing this. At this point I'm not seeing much evidence pointing at networking or the registry servers anymore. But I'm keeping my eye on the situation and will update this issue accordingly.
I think we can close this. I'll open a new issue if/when I get new evidence.
Metadata Update from @cevich: - Issue close_status updated to: Insufficient data - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.