#10375 Problems with fetching epel release metalink from proxy06.fedoraproject.org (140.211.169.206)
Closed: Fixed 2 years ago by samiponkanen. Opened 2 years ago by samiponkanen.

I have been investigating random build failures in our jenkins build system. The builds fail occasionally - maybe a few times per week - in a centos 7 based docker container build phase because the epel repository metalink fetch fails.

I have debugged the issue further and it seems that sometimes https requests to fedora mirror at proxy06.fedoraproject.org (140.211.169.206) either get completely stuck or take extremely long time to complete.

I have verified this with the following curl command:

# curl -v --resolve "mirrors.fedoraproject.org:443:140.211.169.206"  "https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=x86_64"
* About to connect() to mirrors.fedoraproject.org port 443 (#0)
*   Trying 140.211.169.206...
* Connected to mirrors.fedoraproject.org (140.211.169.206) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
*       subject: CN=*.fedoraproject.org,O="Red Hat, Inc.",L=Raleigh,ST=North Carolina,C=US
*       start date: Feb 27 00:00:00 2020 GMT
*       expire date: Mar 02 12:00:00 2022 GMT
*       common name: *.fedoraproject.org
*       issuer: CN=DigiCert SHA2 High Assurance Server CA,OU=www.digicert.com,O=DigiCert Inc,C=US
> GET /metalink?repo=epel-7&arch=x86_64 HTTP/1.1
> User-Agent: curl/7.29.0
> Host: mirrors.fedoraproject.org
> Accept: */*
> 

The request above got stuck after the http request had been sent. Requests do not get stuck on every try, but when doing them in a loop, it does not take many minutes to see this happening. As you can see from the curl output, this problem happens after TLS negotiation has completed. Thus the problem is in no way related to system trusted CA configurations.

The reason why our builds keep failing much less often is that mirrors.fedoraproject.org seems to be DNS load balanced and the name resolver picks a different proxy IP address on each resolve request:

$ host mirrors.fedoraproject.org
mirrors.fedoraproject.org is an alias for wildcard.fedoraproject.org.
wildcard.fedoraproject.org has address 152.19.134.198
wildcard.fedoraproject.org has address 67.219.144.68
wildcard.fedoraproject.org has address 209.132.190.2
wildcard.fedoraproject.org has address 140.211.169.206
wildcard.fedoraproject.org has address 38.145.60.21
wildcard.fedoraproject.org has address 8.43.85.73
wildcard.fedoraproject.org has address 140.211.169.196
wildcard.fedoraproject.org has address 152.19.134.142
wildcard.fedoraproject.org has address 38.145.60.20
wildcard.fedoraproject.org has IPv6 address 2605:bc80:3010:600:dead:beef:cafe:fed9
wildcard.fedoraproject.org has IPv6 address 2604:1580:fe00:0:dead:beef:cafe:fed1
wildcard.fedoraproject.org has IPv6 address 2600:2701:4000:5211:dead:beef:fe:fed3
wildcard.fedoraproject.org has IPv6 address 2620:52:3:1:dead:beef:cafe:fed6
wildcard.fedoraproject.org has IPv6 address 2605:bc80:3010:600:dead:beef:cafe:feda

Does anyone know if this is a known issue, or if it is caused by some request rate limiting going on at proxy06…?

Are there any known work arounds? I am thinking something like blacklisting the hostname proxy06… or IP address 140.211.169.206 so that yum will never attempt to use that mirror.

Br,
Sami


Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain, medium-trouble, ops

2 years ago

We rebalanced things, so likely you wouldn't be hitting proxy06 or proxy09 anymore.

Can you confirm that things look ok now?

We have been watching our build system for epel metalink fetch related errors for a few days now and have not seen any of those errors. We continue watching, but so far it looks like the problem has gone away.

I will keep this issue open for a few more days and report back next week.

Thanks,
Sami

We have not seen this issue anymore in our build system. Closing this ticket.

Thanks!
Sami

Metadata Update from @samiponkanen:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog