#9177 Pulling container images breaks randomly (registry problem?)
Closed: Fixed 2 months ago by kevin. Opened 3 months ago by hhorak.

Describe what you would like us to do:


I'd like to see the Fedora container repository to behave deterministically. What I see are random failures (using podman 1.6.x on CentOS7, not sure whether older version of a client can cause the troubles). What I see is that pulling images fails in some cases, and succeeds few seconds after the failure. It's very undeterministic:

[root@centos7]# podman pull registry.fedoraproject.org/f31/s2i-base:latest
Trying to pull registry.fedoraproject.org/f31/s2i-base:latest...
Getting image source signatures
Copying blob 1457171663a9 done  
Copying blob 64f48977383f done  
Copying blob 82422234ceca done  
Copying config 95c2bf26d4 done  
Writing manifest to image destination
Storing signatures
95c2bf26d49eba2049318841605a616e5580bd69aa7bab4fedd62fb25e908cdb

[root@centos7]# podman pull registry.fedoraproject.org/f32/s2i-base:latest
Trying to pull registry.fedoraproject.org/f32/s2i-base:latest...
Getting image source signatures
Copying blob 44e1bb1e1824 [--------------------------------------] 0.0b / 69.8MiB
Copying blob d6e069648152 [--------------------------------------] 8.2KiB / 16.4MiB
  unexpected EOF
Error: error pulling image "registry.fedoraproject.org/f32/s2i-base:latest": unable to pull registry.fedoraproject.org/f32/s2i-base:latest: unable to pull image: Error writing blob: error setting up to decompress blob: unexpected EOF

[root@centos7]# podman pull registry.fedoraproject.org/f32/s2i-base:latest
Trying to pull registry.fedoraproject.org/f32/s2i-base:latest...
Getting image source signatures
Copying blob d6e069648152 done  
Copying blob 192e0a29e681 [======================================] 177.4MiB / 177.4MiB
Copying blob 44e1bb1e1824 done  
Copying config 6be317b4e3 [======================================] 5.8KiB / 5.8KiB
Writing manifest to image destination
Storing signatures
6be317b4e3d03498c9805598bec11d06ee3b1577d5ff1fd3702a3c34c885f4be

[root@centos7]# podman pull registry.fedoraproject.org/f32/s2i-base:latest
Trying to pull registry.fedoraproject.org/f32/s2i-base:latest...
Getting image source signatures
Copying blob 192e0a29e681 skipped: already exists  
Copying blob 44e1bb1e1824 skipped: already exists  
Copying blob d6e069648152 [--------------------------------------] 0.0b / 0.0b
Copying config 6be317b4e3 done  
Writing manifest to image destination
Storing signatures
6be317b4e3d03498c9805598bec11d06ee3b1577d5ff1fd3702a3c34c885f4be

[root@centos7]# podman pull registry.fedoraproject.org/f33/s2i-base:latest
Trying to pull registry.fedoraproject.org/f33/s2i-base:latest...
Getting image source signatures
Copying blob df6804b6febf [>-------------------------------------] 4.0MiB / 164.4MiB
Copying blob f35b398c6aac [======================================] 70.1MiB / 70.1MiB
Copying blob 12542a84343f [======================>---------------] 4.4MiB / 7.2MiB
  unexpected EOF
Error: error pulling image "registry.fedoraproject.org/f33/s2i-base:latest": unable to pull registry.fedoraproject.org/f33/s2i-base:latest: unable to pull image: Error writing blob: error storing blob to file "/var/tmp/storage191954987/3": unexpected EOF

[root@centos7]# podman pull registry.fedoraproject.org/f33/s2i-base:latest
Trying to pull registry.fedoraproject.org/f33/s2i-base:latest...
Getting image source signatures
  unexpected EOF
Error: error pulling image "registry.fedoraproject.org/f33/s2i-base:latest": unable to pull registry.fedoraproject.org/f33/s2i-base:latest: unable to pull image: Error writing blob: error storing blob to file "/var/tmp/storage931605175/1": unexpected EOF

[root@centos7]# podman pull registry.fedoraproject.org/f33/s2i-base:latest
Trying to pull registry.fedoraproject.org/f33/s2i-base:latest...
Getting image source signatures
Copying blob f35b398c6aac done  
Copying blob 12542a84343f done  
Copying blob df6804b6febf done  
Copying config 390f2192d5 [======================================] 5.8KiB / 5.8KiB
Writing manifest to image destination
Storing signatures
390f2192d5c067c7c9e59a60e1b49d20054dcae7af2975d6f157c99240a1941e

When do you need this to be done by? (no specific deadline)



I have similar problem:

14:25:12 -> Pulling image registry.fedoraproject.org/f31/s2i-base:latest before building image from Dockerfile.fedora.
14:25:12 Trying to pull repository registry.fedoraproject.org/f31/s2i-base ... 
14:25:12 Pulling repository registry.fedoraproject.org/f31/s2i-base
14:25:12 Error: image f31/s2i-base:latest not found
14:25:12 Pulling image registry.fedoraproject.org/f31/s2i-base:latest failed. Let's wait 2 seconds and try one more time.
14:25:14 Trying to pull repository registry.fedoraproject.org/f31/s2i-base ... 
14:25:15 Pulling repository registry.fedoraproject.org/f31/s2i-base
14:25:15 Error: image f31/s2i-base:latest not found
14:25:15 make[1]: *** [3.7] Error 1
14:25:15 make[1]: Leaving directory `/root/sources'
14:25:15 make: *** [build-serial] Error 2

Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: groomed, high-gain, medium-trouble

3 months ago

Can you all re-run with '--log-level=debug' and attach the output? Might help us track this down. Thanks!

ok. I think I found at least part of the issue. One of our pair of registry hosts was throwing "too many open flies" errors on some operations.

Can you all look and see if you see any issues now?

Now I have tried to pull image from Fedora Registry and the output is here:
https://ci.centos.org/job/SCLo-container-ruby-rh-fedora/129/console

Snipped:

11:24:03 Step 1/12 : FROM registry.fedoraproject.org/f31/s2i-base
11:24:03 Trying to pull repository registry.fedoraproject.org/f31/s2i-base ... 
11:24:03 latest: Pulling from registry.fedoraproject.org/f31/s2i-base
11:24:03 82422234ceca: Pulling fs layer
11:24:03 64f48977383f: Pulling fs layer
11:24:03 1457171663a9: Pulling fs layer
11:24:05 82422234ceca: Retrying in 5 seconds
11:24:05 64f48977383f: Verifying Checksum
11:24:05 64f48977383f: Download complete
11:24:05 1457171663a9: Retrying in 5 seconds
11:24:06 82422234ceca: Retrying in 4 seconds
11:24:06 1457171663a9: Retrying in 4 seconds
11:24:07 82422234ceca: Retrying in 3 seconds
11:24:07 1457171663a9: Retrying in 3 seconds
11:24:08 82422234ceca: Retrying in 2 seconds
11:24:08 1457171663a9: Retrying in 2 seconds
11:24:09 82422234ceca: Retrying in 1 second
11:24:09 1457171663a9: Retrying in 1 second
11:24:10 82422234ceca: Verifying Checksum
11:24:10 82422234ceca: Retrying in 10 seconds
11:24:10 1457171663a9: Verifying Checksum
11:24:10 1457171663a9: Retrying in 10 seconds
11:24:11 82422234ceca: Retrying in 9 seconds
11:24:11 1457171663a9: Retrying in 9 seconds
11:24:12 82422234ceca: Retrying in 8 seconds
11:24:12 1457171663a9: Retrying in 8 seconds
11:24:13 82422234ceca: Retrying in 7 seconds
11:24:13 1457171663a9: Retrying in 7 seconds
11:24:14 82422234ceca: Retrying in 6 seconds
11:24:14 1457171663a9: Retrying in 6 seconds
11:24:15 82422234ceca: Retrying in 5 seconds
11:24:15 1457171663a9: Retrying in 5 seconds
11:24:16 82422234ceca: Retrying in 4 seconds
11:24:16 1457171663a9: Retrying in 4 seconds
11:24:17 82422234ceca: Retrying in 3 seconds
11:24:17 1457171663a9: Retrying in 3 seconds
11:24:18 82422234ceca: Retrying in 2 seconds
11:24:18 1457171663a9: Retrying in 2 seconds
11:24:19 82422234ceca: Retrying in 1 second
11:24:19 1457171663a9: Retrying in 1 second
11:24:20 82422234ceca: Retrying in 15 seconds
11:24:21 82422234ceca: Retrying in 14 seconds
11:24:21 1457171663a9: Retrying in 15 seconds
11:24:22 82422234ceca: Retrying in 13 seconds
11:24:22 1457171663a9: Retrying in 14 seconds
11:24:23 82422234ceca: Retrying in 12 seconds
11:24:23 1457171663a9: Retrying in 13 seconds
11:24:24 82422234ceca: Retrying in 11 seconds
11:24:24 1457171663a9: Retrying in 12 seconds
11:24:25 82422234ceca: Retrying in 10 seconds
11:24:25 1457171663a9: Retrying in 11 seconds
11:24:26 82422234ceca: Retrying in 9 seconds
11:24:26 1457171663a9: Retrying in 10 seconds
11:24:27 82422234ceca: Retrying in 8 seconds
11:24:27 1457171663a9: Retrying in 9 seconds
11:24:28 82422234ceca: Retrying in 7 seconds
11:24:28 1457171663a9: Retrying in 8 seconds
11:24:29 82422234ceca: Retrying in 6 seconds
11:24:29 1457171663a9: Retrying in 7 seconds
11:24:30 82422234ceca: Retrying in 5 seconds
11:24:30 1457171663a9: Retrying in 6 seconds
11:24:31 82422234ceca: Retrying in 4 seconds
11:24:31 1457171663a9: Retrying in 5 seconds
11:24:32 82422234ceca: Retrying in 3 seconds
11:24:32 1457171663a9: Retrying in 4 seconds
11:24:33 82422234ceca: Retrying in 2 seconds
11:24:33 1457171663a9: Retrying in 3 seconds
11:24:34 82422234ceca: Retrying in 1 second
11:24:34 1457171663a9: Retrying in 2 seconds
11:24:35 1457171663a9: Retrying in 1 second
11:24:36 82422234ceca: Verifying Checksum
11:24:36 82422234ceca: Download complete
11:24:36 1457171663a9: Retrying in 20 seconds
11:24:37 1457171663a9: Retrying in 19 seconds
11:24:38 1457171663a9: Retrying in 18 seconds
11:24:39 1457171663a9: Retrying in 17 seconds
11:24:40 1457171663a9: Retrying in 16 seconds
11:24:41 82422234ceca: Pull complete
11:24:41 1457171663a9: Retrying in 15 seconds
11:24:42 64f48977383f: Pull complete
11:24:42 1457171663a9: Retrying in 14 seconds
11:24:43 1457171663a9: Retrying in 13 seconds
11:24:44 1457171663a9: Retrying in 12 seconds
11:24:45 1457171663a9: Retrying in 11 seconds
11:24:46 1457171663a9: Retrying in 10 seconds
11:24:47 1457171663a9: Retrying in 9 seconds
11:24:48 1457171663a9: Retrying in 8 seconds
11:24:49 1457171663a9: Retrying in 7 seconds
11:24:50 1457171663a9: Retrying in 6 seconds
11:24:51 1457171663a9: Retrying in 5 seconds
11:24:52 1457171663a9: Retrying in 4 seconds
11:24:53 1457171663a9: Retrying in 3 seconds
11:24:54 1457171663a9: Retrying in 2 seconds
11:24:55 1457171663a9: Retrying in 1 second
11:24:56 1457171663a9: Verifying Checksum
11:24:56 filesystem layer verification failed for digest sha256:1457171663a9c7c50e99f3282e1788f409e842913bdd93dc741d75ca4ecb121a
11:24:56 make[1]: Leaving directory `/root/sources'
11:24:56 make[1]: *** [2.6] Error 1
11:24:56 make: *** [build-serial] Error 2
11:24:56 Build step 'Execute shell' marked build as failure

I reported a duplicate - we are hit by this in Toolbox, Flatpaks, .. https://pagure.io/fedora-infrastructure/issue/9210

I just made some changes that I hope will address this.

Can everyone test and let me know if it seems more stable/faster or about the same?

This happened for FreeIPA CI for more than two months already -- in many attempts only second or third re-try pulls images from registry.fedoraproject.org, with majority of issues are a variety of 50x errors or a just EOF. We thought it was all connected with a datacenter migration reduction of available resources and did not report the obvious.

It seems to be better now indeed! Personally, I was observing the issue when trying to update software from the Fedora Flatpak repository on my machines. For the last few days, I was unable to do so, getting HTTP 503 errors or premature disconnects way before the download process was complete. After you made the changes, I have been able to pull some major updates, including a 600+ MB update of the Fedora 32 Flatpak platform, on three machines. This certainly is an improvement, as previously I could download nothing at all, or only a few MB, before the process failed.

One thing, though: My systems would also still like to pull an updated version of the Fedora 31 Flatpak platform, presumably because the latet version of the Gimp in the Fedora Flatpak repository (still) depends on it. Trying to pull the Fedora 31 platform consistently fails like this, however:

[nholland@pclap013 ~]$ flatpak update
Looking for updates…


        ID                                   Branch          Op          Remote          Download
 1. [✗] org.fedoraproject.Platform           f31             u           fedora          1.0 kB / 605.6 MB

Error: Checksum digest did not match (sha256:0bce5cd3e85a56139a052d2d8db104b30553d99fec44c4c369a09f7134791e81 != e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855)
Updates complete.
error: There were one or more errors

I'm not sure if the root cause is the same as the issue we're dealing with here or whether it's something else, but I still thought I'd report it. :-)

I just made some changes that I hope will address this.

Can everyone test and let me know if it seems more stable/faster or about the same?

I can't no longer see any problems. I can update Flatpaks and create Toolboxes with the images that I don't have downloaded on my system. Thanks!

The OCI pull code here just tries to download a layer blob, which is supposed to be 605.6 MB. And then it compares the resultant checksum with the name of the blob (which is a sha256 digest). This fails, and the "1.0 kB" part of the output seems to indicate that this is due to a short read.

I wonder what really happened, if it got some kind of http error which we silently ignored, or if it got a short read but with no error reported.

Can you retry everything with OSTREE_DEBUG_HTTP set in the environment?

curl -v -o foo https://cdn.registry.fedoraproject.org/v2/f31/flatpak-runtime/blobs/sha256:0bce5cd3e85a56139a052d2d8db104b30553d99fec44c4c369a09f7134791e81 gives me a 0 byte file that has a etag header of sha256:0bce5cd3e85a56139a052d2d8db104b30553d99fec44c4c369a09f7134791e81 (which is right, but not what a 0 file checksums as).

It has x-cache: Hit from cloudfront, so i suspect the cdn cached a broken file.

No problem, here it comes:

[nholland@pclap013 ~]$ flatpak update
Looking for updates…
*   Trying 2a04:4e42:3::729:443...
* Connected to dl.flathub.org (2a04:4e42:3::729) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=San Francisco; O=Fastly, Inc.; CN=t2.shared.global.fastly.net
*  start date: Aug  4 18:09:14 2020 GMT
*  expire date: Apr 15 20:14:30 2021 GMT
*  subjectAltName: host "dl.flathub.org" matched cert's "dl.flathub.org"
*  issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign CloudSSL CA - SHA256 - G3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x5560f2835b40)
> GET /repo/summary.sig HTTP/2
Host: dl.flathub.org
user-agent: libostree/2020.4
accept: */*
accept-encoding: deflate, gzip, br

* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 200 
< server: nginx/1.14.0 (Ubuntu)
< content-type: application/octet-stream
< last-modified: Mon, 10 Aug 2020 04:33:23 GMT
< etag: "5f30ce13-250"
< expires: Mon, 10 Aug 2020 05:33:29 GMT
< cache-control: max-age=3600
< strict-transport-security: max-age=15768000
< backend-name: 3DxooTFj8SlVTdJ0UTX8Jd--F_front_ams_https
< accept-ranges: bytes
< date: Mon, 10 Aug 2020 08:26:29 GMT
< via: 1.1 varnish
< age: 3179
< x-served-by: cache-fra19161-FRA
< x-cache: HIT
< x-cache-hits: 547
< x-timer: S1597047990.545485,VS0,VE0
< content-length: 592
< 
* Connection #0 to host dl.flathub.org left intact


        ID                                   Branch          Op          Remote          Download
 1. [/] org.fedoraproject.Platform           f31             u           fedora          1.0 kB / 605.6 MB

Updating… █▌                     8%> GET /v2/f31/flatpak-runtime/blobs/sha256:0bce5cd3e85a56139a052d2d8db104b30553d99fec44c4c369a09f7134791e81 HTTP/1.1
> Soup-Debug-Timestamp: 1597047995
> Soup-Debug: SoupSession 1 (0x5560f287b100), SoupMessage 1 (0x5560f28ddc70), SoupSocket 1 (0x5560f2f5d900)
> Host: registry.fedoraproject.org
> Accept: application/vnd.oci.image.manifest.v1+json, application/vnd.docker.distribution.manifest.v2+json
> User-Agent: Flatpak 1.6.5
> Connection: Keep-Alive

< HTTP/1.1 302 Found
< Soup-Debug-Timestamp: 1597047996
< Soup-Debug: SoupMessage 1 (0x5560f28ddc70)
< Date: Mon, 10 Aug 2020 08:26:35 GMT
< Server: Apache
< Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
< X-Frame-Options: SAMEORIGIN
< X-Xss-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Referrer-Policy: same-origin
< Location: https://cdn.registry.fedoraproject.org/v2/f31/flatpak-runtime/blobs/sha256:0bce5cd3e85a56139a052d2d8db104b30553d99fec44c4c369a09f7134791e81
< Content-Length: 400
< Keep-Alive: timeout=15, max=500
< Connection: Keep-Alive
< Content-Type: text/html; charset=iso-8859-1
< 

> GET /v2/f31/flatpak-runtime/blobs/sha256:0bce5cd3e85a56139a052d2d8db104b30553d99fec44c4c369a09f7134791e81 HTTP/1.1
> Soup-Debug-Timestamp: 1597047996
> Soup-Debug: SoupSession 1 (0x5560f287b100), SoupMessage 1 (0x5560f28ddc70), SoupSocket 2 (0x5560f2f5d9d0), restarted
> Host: cdn.registry.fedoraproject.org
> Accept: application/vnd.oci.image.manifest.v1+json, application/vnd.docker.distribution.manifest.v2+json
> Connection: Keep-Alive
> User-Agent: Flatpak 1.6.5

< HTTP/1.1 200 OK
< Soup-Debug-Timestamp: 1597047996
< Soup-Debug: SoupMessage 1 (0x5560f28ddc70)
< Content-Type: application/octet-stream
< Content-Length: 0
< Connection: keep-alive
< Date: Fri, 07 Aug 2020 23:42:45 GMT
< Server: Apache
< Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
< X-Frame-Options: SAMEORIGIN
< X-Xss-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Referrer-Policy: same-origin
< Accept-Ranges: bytes
< Cache-Control: max-age=31536000
< docker-content-digest: sha256:0bce5cd3e85a56139a052d2d8db104b30553d99fec44c4c369a09f7134791e81
< docker-distribution-api-version: registry/2.0
< ETag: "sha256:0bce5cd3e85a56139a052d2d8db104b30553d99fec44c4c369a09f7134791e81"
< X-Varnish: 38459885
< Via: 1.1 varnish (Varnish/6.3), 1.1 269bfdd288bfea5423a4e9e701777da6.cloudfront.net (CloudFront)
< AppTime: D=8135
< X-Fedora-ProxyServer: proxy10.iad2.fedoraproject.org
< X-Fedora-RequestID: Xy3m9QBDCMiMLbydu3ujdQAAAYk
< X-Cache: Hit from cloudfront
< X-Amz-Cf-Pop: FRA53-C1
< X-Amz-Cf-Id: XMyUg1HGsUF3TjUmBMpweyubn2AEOtV1vg54v71OBlOGFt1C4Gpacg==
        ID                                   Branch          Op          Remote          Download
 1. [✗] org.fedoraproject.Platform           f31             u           fedora          1.0 kB / 605.6 MB

Error: Checksum digest did not match (sha256:0bce5cd3e85a56139a052d2d8db104b30553d99fec44c4c369a09f7134791e81 != e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855)
Updates complete.
error: There were one or more errors

Not sure if it helps, but when I do

wget https://cdn.registry.fedoraproject.org/v2/f31/flatpak-runtime/blobs/sha256:0bce5cd3e85a56139a052d2d8db104b30553d99fec44c4c369a09f7134791e81

I do indeed get a file, but it's completely empty (i.e. it has a size of 0 bytes).

@nholland tpopela got the same, and i get a 0 byte file from curl/wget, so it seems to be a CDN issue. However, the weird thing is that both of us can correctly download that file in the browser.

Indeed, I can confirm that when opening the URL with Firefox, I get a 578 MB file which downloads successfully. Just to be sure, I tried downloading it with wget once more, but it failed again. Hmm, very weird!

note: I am on PTO today, just happened to see this
I invalidated that object in the cdn and it should be right now I hope...

I'll check back tomorrow.

Indeed, downloading the file via wget, as well as the "flatpak update" operation I was originally trying to perform, is working fine here now. As far as I'm concerned, that solves the issue.

note: I am on PTO today, just happened to see this
I invalidated that object in the cdn and it should be right now I hope...

I'll check back tomorrow.

Thank you Kevin, I really appreciate it! I can confirm, that everything works as intended now!

Awesome. Please let me know if any of you see any further stablity issues with it (via a new ticket or re-opening this one).

Sorry for the hassle.

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 months ago

Login to comment on this ticket.

Metadata