#8071 Intermittent non-connectivity between GCE and registry.fedoraproject.org
Closed: Insufficient data 4 years ago by cevich. Opened 4 years ago by cevich.

We run a CI in VMs running from the us-central1-a zone of GCE. Starting the week of June 28th through now, we've been seeing errors pulling from registry.fedoraproject.org/fedora-minimal:latest whereas (in the same VM and context) we are able to pull from other registries. Example a relevant section of output from our CI.

Unfortunately that "SHA doesn't match error" is the best we can manage from the job itself. But, I have access to manually spin up one of these VMs to poke and prod whatever lower-level bits would be helpful to debugging.


Could you define what GCE is? [There are several G.* Cloud Environments it could be.] Second could you put a traceroute from there to our proxies?

We run jobs in both containers (GKE) and GCE (Google-compute-engine). I don't recall seeing failures in GKE, only from our GCE project. The failures only seem to happen when we're not looking.

I have a script that lets me create a 99% automation-identical VM, but (of course) I have no problem pulling registry.fedoraproject.org/fedora-minimal:latest from there :confounded: Yes I can get you some traceroute data, and whatever else you need. I'm also trying my darnedest to eliminate our testing|software layers from the equation...

# mtr -4 --report --show-ips --report-cycles=50 registry.fedoraproject.org
Start: 2019-08-05T12:17:52-0400
HOST: cevich-fedora-30-libpod-547 Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 209.85.241.122             0.0%    50   10.9  11.0  10.7  12.8   0.5
  2.|-- 108.170.243.231            0.0%    50   12.7  12.2  10.4  37.8   4.8
  3.|-- ???                       100.0    50    0.0   0.0   0.0   0.0   0.0
  4.|-- ???                       100.0    50    0.0   0.0   0.0   0.0   0.0
  5.|-- ???                       100.0    50    0.0   0.0   0.0   0.0   0.0
  6.|-- ???                       100.0    50    0.0   0.0   0.0   0.0   0.0
  7.|-- ???                       100.0    50    0.0   0.0   0.0   0.0   0.0
  8.|-- ???                       100.0    50    0.0   0.0   0.0   0.0   0.0
  9.|-- et-3-3-0.582.rtsw.rale.ne  0.0%    50   32.1  32.2  31.9  34.5   0.6
 10.|-- 198.71.47.222              0.0%    50   32.9  32.8  32.4  34.7   0.4
 11.|-- 128.109.25.14              0.0%    50   33.4  33.8  33.2  48.0   2.2
 12.|-- 8.43.84.1                  0.0%    50   66.0  74.7  53.5 174.4  29.9
 13.|-- 8.43.84.3                  0.0%    50   33.5  33.7  33.4  38.4   0.7
 14.|-- 8.43.84.4                  0.0%    50   52.9  51.5  41.4 112.9  11.1
 15.|-- ip-8-43-87-254 (8.43.87.2  0.0%    50  156.4  42.2  34.1 156.4  17.5
 16.|-- proxy14.fedoraproject.org  0.0%    50   33.6  33.7  33.5  34.7   0.2
# mtr -4 --report --show-ips --report-cycles=50 registry.fedoraproject.org
Start: 2019-08-05T12:20:30-0400
HOST: cevich-fedora-30-libpod-547 Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 216.239.59.150             0.0%    50   10.9  10.8  10.6  11.9   0.2
  2.|-- 108.170.244.6              0.0%    50   10.5  10.9  10.4  23.0   1.8
  3.|-- ???                       100.0    50    0.0   0.0   0.0   0.0   0.0
  4.|-- ???                       100.0    50    0.0   0.0   0.0   0.0   0.0
  5.|-- ???                       100.0    50    0.0   0.0   0.0   0.0   0.0
  6.|-- ???                       100.0    50    0.0   0.0   0.0   0.0   0.0
  7.|-- ???                       100.0    50    0.0   0.0   0.0   0.0   0.0
  8.|-- ???                       100.0    50    0.0   0.0   0.0   0.0   0.0
  9.|-- et-3-3-0.582.rtsw.rale.ne  0.0%    50   32.0  32.1  31.8  34.2   0.5
 10.|-- 198.71.47.222              0.0%    50   32.7  33.1  32.3  50.6   2.5
 11.|-- 128.109.25.14              0.0%    50   33.3  33.5  33.2  34.2   0.3
 12.|-- 8.43.84.1                  0.0%    50   51.6  53.9  38.9 161.4  20.6
 13.|-- 8.43.84.3                  0.0%    50   33.5  33.4  33.4  33.7   0.1
 14.|-- 8.43.84.4                  0.0%    50   51.7  54.5  41.9 148.2  20.4
 15.|-- ip-8-43-87-254 (8.43.87.2  0.0%    50   92.5  55.2  36.0 157.0  27.2
 16.|-- proxy03.fedoraproject.org  0.0%    50   33.7  33.9  33.6  36.6   0.7
# mtr -4 --report --show-ips --report-cycles=50 registry.fedoraproject.org
Start: 2019-08-05T12:22:06-0400
HOST: cevich-fedora-30-libpod-547 Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 209.85.252.47              0.0%    50   25.2  25.5  25.2  33.8   1.2
  2.|-- 216.239.48.94              0.0%    50   25.1  25.8  25.1  38.4   2.4
  3.|-- 108.170.246.7              0.0%    50   25.4  25.8  25.2  40.6   2.2
  4.|-- 198.86.53.238              0.0%    50   31.7  32.2  31.6  43.7   1.7
  5.|-- ws-gw-to-rtp-gw.ncren.net  0.0%    50   32.7  32.7  32.4  33.3   0.2
  6.|-- uncphillips-to-ws-gw.ncre  0.0%    50   36.8  36.5  36.4  36.9   0.1
  7.|-- core-p-v1213.net.unc.edu   0.0%    50   36.7  36.7  36.5  37.0   0.1
  8.|-- 152.2.255.166              0.0%    50   36.3  36.3  36.2  36.4   0.0
  9.|-- vm18.fedora.ibiblio.org (  0.0%    50   36.3  36.4  36.2  39.1   0.4

Hmmm I see there are quite a few addresses on both sides here (outbound and destinations). Trying each destination from dns...

So slight loss on proxy13-rdu02.fedoraproje... and proxy10.fedoraproject.org otherwise I'm assuming there's nothing you guys can do for zayo and...dang, seems I should have used --no-dns...

This time slight losses on proxy10.fedoraproject.org (209.132.181.15) and again on proxy13-rdu02.fedoraproject.org (209.132.190.2).

What other data can I provide?

Update: Running mtr again this morning, I no-longer see the 2-4% drops from the registry server ends. Also, I was mistaken with the original example log I linked in the description, that's a totally unrelated/different problem.

The suspected-networking issue is this one, which appears to be more rare. Last occurring at 2019-08-06T00:38:42+00:00

So the error you are posting is about getting to Error determining manifest MIME type for docker://registry.access.redhat.com/fedora-minimal:latest:

That is redhat.com and not fedoraproject.org.

Metadata Update from @smooge:
- Issue assigned to smooge

4 years ago

Metadata Update from @smooge:
- Issue priority set to: Waiting on Reporter (was: Needs Review)

4 years ago

Oh good catch, that explains the final error. Just prior to that though we see the error from Trying to pull registry.fedoraproject.org/fedora-minimal:latest.

However, we have a known problem in podman saving images, the team is addressing this. At this point I'm not seeing much evidence pointing at networking or the registry servers anymore. But I'm keeping my eye on the situation and will update this issue accordingly.

I think we can close this. I'll open a new issue if/when I get new evidence.

Metadata Update from @cevich:
- Issue close_status updated to: Insufficient data
- Issue status updated to: Closed (was: Open)

4 years ago

Login to comment on this ticket.

Metadata
Attachments 2
Attached 4 years ago View Comment
Attached 4 years ago View Comment