Hello,
Going through logs of our (not only) CentOS CI jobs, the vault.centos.org server, which hosts CentOS SRPMs, has been somewhat unstable for the past several days. In many cases fetching metadata for the *-sources repositories fails[0][1], be it because of timeout, slow download, or even routing issues. Is there some infra maintenance going on?
vault.centos.org
*-sources
Thanks!
[0]
enabling appstream-source repository enabling baseos-source repository enabling extras-source repository CentOS Linux 8 - BaseOS - Source 0.0 B/s | 0 B 02:13 Errors during downloading metadata for repository 'baseos-source': - Curl error (28): Timeout was reached for http://vault.centos.org/centos/8/BaseOS/Source/repodata/repomd.xml [Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds] - Curl error (28): Timeout was reached for https://vault.centos.org/centos/8/BaseOS/Source/repodata/repomd.xml [Connection timed out after 30461 milliseconds] - Curl error (28): Timeout was reached for http://vault.centos.org/centos/8/BaseOS/Source/repodata/repomd.xml [Connection timed out after 30001 milliseconds] Error: Failed to download metadata for repo 'baseos-source': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
[1]
enabling appstream-source repository enabling baseos-source repository enabling extras-source repository CentOS Linux 8 - BaseOS - Source 3.5 kB/s | 290 kB 01:23 CentOS Linux 8 - AppStream - Source 0.0 B/s | 0 B 01:51 Errors during downloading metadata for repository 'appstream-source': - Curl error (28): Timeout was reached for http://vault.centos.org/centos/8/AppStream/Source/repodata/33682491b2a4f3b971da89089eb8be14715588b8ecf045ce7cb4995717284e71-filelists.xml.gz [Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds] - Curl error (28): Timeout was reached for http://vault.centos.org/centos/8/AppStream/Source/repodata/63ee95c1e2e95ee2e52642230de2591fd2b4c9e5716bc9ad193805ac6e44f49a-primary.xml.gz [Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds] - Curl error (28): Timeout was reached for https://vault.centos.org/centos/8/AppStream/Source/repodata/63ee95c1e2e95ee2e52642230de2591fd2b4c9e5716bc9ad193805ac6e44f49a-primary.xml.gz [Connection timed out after 30604 milliseconds] - Curl error (28): Timeout was reached for http://vault.centos.org/centos/8/AppStream/Source/repodata/63ee95c1e2e95ee2e52642230de2591fd2b4c9e5716bc9ad193805ac6e44f49a-primary.xml.gz [Connection timed out after 30000 milliseconds] Error: Failed to download metadata for repo 'appstream-source': Yum repo downloading error: Downloading error(s): repodata/63ee95c1e2e95ee2e52642230de2591fd2b4c9e5716bc9ad193805ac6e44f49a-primary.xml.gz - Cannot download, all mirrors were already tried without success; repodata/33682491b2a4f3b971da89089eb8be14715588b8ecf045ce7cb4995717284e71-filelists.xml.gz - Cannot download, all mirrors were already tried without success
there is no maintenance and nodes behind vault.centos.org (5 nodes, mainly EC2 instances) are all reachable and under monitoring. wondering if we want to have (for CI) one internal vault mirror, instead of relying on the network link to outside (that we don't control btw , and that is shared between multiple projects hosted in the same cage, without any QoS )
but can you give us details please ? Is that happening from a openshift container ? as I tried from a node in same CI vlan (VM) and it works fine :
curl --location http://vault.centos.org/centos/8/BaseOS/Source/repodata/repomd.xml
I know that @dkirwan needed to investigate a similar timeout issue happening only in OCP : see for example #159
Metadata Update from @arrfab: - Issue tagged with: centos-ci-infra
but can you give us details please ? Is that happening from a openshift container ?
No, every instance of this issue happened on Duffy nodes. Going through logs, here are some detailed information (where and when the timeout happened, in CET timezone, +- a couple minutes), the issue is pretty intermittent:
[2020-12-16] - n30.crusty: 9:15 AM - n29.crusty: 9:14 AM - n56.gusty: 8:13 AM - n55.crusty: 8:09 AM - n53.pufty: 8:08 AM - n38.pufty: 7:05 AM - n53.pufty: 6:04 AM - n62.dusty: 6:04 AM
[2020-12-15] - n40.crusty: 8:34 AM - n43.crusty: 8:33 AM - n12.gusty: 7:15 AM - n10.gusty: 7:13 AM - n2.gusty: 6:48 AM - n3.gusty: 6:48 AM - n4.gusty: 5:23 AM - n58.dusty: 4:29 AM - n52.dusty: 4:29 AM - n39.gusty: 1:12 AM - n40.gusty: 1:11 AM - n42.gusty: 12:42 AM - n47.pufty: 12:39 AM
[2020-12-10] - n47.pufty: 12:17 PM - n37.dusty: 10:51 AM - n41.dusty: 10:49 AM - n40.dusty: 10:48 AM - n6.dusty: 10:33 AM - n13.dusty: 10:33 AM - n46.crusty: 10:27 AM - n47.crusty: 10:27 AM - n40.crusty: 10:25 AM - n39.crusty: 10:23 AM - n32.crusty: 10:17 AM - n37.crusty: 10:17 AM - n12.crusty: 10:14 AM - n26.crusty: 10:10 AM
Thanks for the info, so not limited to containers running from inside OCP/Openshift. We have 5 nodes behind vault.centos.org, 2 in US and 3 in EU, but you should always hit (we use GeoIP for that) one from US. As said, I can't reproduce the issue to have a clear view on where it happens (as it goes through the same gateway and then shared router/firewall between multiples projects (and we can't see if there are routing issue). While not denying an issue to reach vault.centos.org, I'm wondering about your use case from within CI vlan. Because the same way we have internal mirror.centos.org, we can work on internal vault.centos.org to let traffic internal to that vlan and that would solve (and reduce in/out traffic too).
Would you mind elaborating ?
While not denying an issue to reach vault.centos.org, I'm wondering about your use case from within CI vlan. Because the same way we have internal mirror.centos.org, we can work on internal vault.centos.org to let traffic internal to that vlan and that would solve (and reduce in/out traffic too). Would you mind elaborating ?
While not denying an issue to reach vault.centos.org, I'm wondering about your use case from within CI vlan. Because the same way we have internal mirror.centos.org, we can work on internal vault.centos.org to let traffic internal to that vlan and that would solve (and reduce in/out traffic too).
The only use case in our scenarios is installation of build dependencies (dnf builddep systemd, etc.).
dnf builddep systemd
Mirroring the SRPMs internally would most likely help, but I'm not sure if the amount of work required to do that would balance the benefits.
Also, just thanks to this issue I came to the realization that SRPM repositories aren't mirrored in the same way as the rest of the repositories, which is kind of interesting :-)
Metadata Update from @arrfab: - Issue assigned to arrfab
Metadata Update from @arrfab: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: low-gain, low-trouble
Just to let you know that it's now in progress :
* e4bd289 - (HEAD -> master, origin/master, origin/HEAD) Converted old store01 with enough storage as internal vault. #172 (3 minutes ago) <Fabian Arrotin>
So ansible reconfigured that box and content is now being imported. I'll close this ticket after A record will be pushed internally and a quick test
vault internal A record was pushed, now that all content landed. So that should speed-up your builds for now. Closing ticket
Metadata Update from @arrfab: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Perfect, thanks a lot!
Login to comment on this ticket.