#7301 Copr builds are failing due to mirror errors
Closed: Fixed 5 years ago by kevin. Opened 5 years ago by clime.

  • Describe what you need us to do:

Fix problem with mirrors. Today number of Copr builds failed due to problem with downloading build dependencies from mirrors during build. Original bug report is here

https://bugzilla.redhat.com/show_bug.cgi?id=1638048

I selected some examples:

https://copr-be.cloud.fedoraproject.org/results/mkyral/plasma-unstable/fedora-28-x86_64/00807905-plasma-user-manager/builder-live.log

https://copr-be.cloud.fedoraproject.org/results/mkyral/plasma-unstable/fedora-28-x86_64/00807908-kgamma/builder-live.log

https://copr-be.cloud.fedoraproject.org/results/mkyral/plasma-unstable/fedora-28-x86_64/00807907-plasma-nm/builder-live.log

with errors like:

[MIRROR] gpgme-1.10.0-4.fc28.x86_64.rpm: Status code: 503 for https://dl.fedoraproject.org/pub/fedora/linux/updates/testing/28/Everything/x86_64/Packages/g/gpgme-1.10.0-4.fc28.x86_64.rpm
[FAILED] gpgme-1.10.0-4.fc28.x86_64.rpm: No more mirrors to try - All mirrors were already tried without success
(27-28/163): libassu 21% [====                ] 2.3 MB/s |  18 MB     00:29 ETA
Error: Error downloading packages:
  Cannot download Packages/g/gpgme-1.10.0-4.fc28.x86_64.rpm: All mirrors were tried

There are actually lots of other similar builds that failed.


Odd. This is on our download servers, but I am at a loss as to why they would return a 503. It's just reading static files from nfs and serving them, there's no app here or anything.

Will try and figure out how this could happen... please let me know if it's persisting.

Metadata Update from @kevin:
- Issue assigned to kevin
- Issue priority set to: Waiting on Assignee (was: Needs Review)

5 years ago

Is there a way to make copr round robin around https://dl01.fedoraproject.org -> dl05.fedoraproject.org like it was a set of mirrors to try? What I am seeing is that the 503's seem to occur when a client gets overloaded for whatever reason. (At ~07:40 it looks like a lot of copr builds tried only dl01.fedoraproject.org, the same for other 503 timeouts on other guests)

There may be an item we need to tune on our end, but it will mostly lower the amount of allowed connections per download server so it may move the 503's to another code and the server needs to be able to either try a different mirror or retry in 10-30 seconds.

Is there a way to make copr round robin around https://dl01.fedoraproject.org -> dl05.fedoraproject.org like it was a set of mirrors to try? What I am seeing is that the 503's seem to occur when a client gets overloaded for whatever reason. (At ~07:40 it looks like a lot of copr builds tried only dl01.fedoraproject.org, the same for other 503 timeouts on other guests)

There may be an item we need to tune on our end, but it will mostly lower the amount of allowed connections per download server so it may move the 503's to another code and the server needs to be able to either try a different mirror or retry in 10-30 seconds.

Copr more less mirrors what happens on Fedora user-end machines as we just use dnf and public mirrors to do the job of installing packages into buildroots.

This is hence something that would need to be discussed with DNF team if they are able to do something like circling through dl01 to dl05. Also it probably does cirle alredy:

No more mirrors to try - All mirrors were already tried without success
(16-17/161): gcc-c++ 10% [==                  ] 3.2 MB/s |  15 MB     00:38 ETA
Error: Error downloading packages:
  Cannot download Packages/f/fedora-release-28-3.noarch.rpm: All mirrors were tried

Couldn't there be just dl.fedoraproject.org but load-balanced?

When I look at the data from dnf.librepo.log, there is only:

14:12:08 http://dl.fedoraproject.org/pub/fedora/linux/releases/28/Everything/x86_64/os/repodata/repomd.xml
14:12:08 https://dl.fedoraproject.org/pub/fedora/linux/releases/28/Everything/x86_64/os/repodata/repomd.xml

so i suspect dl.fedoraproject.org is already load-balanced to dl01 and dl05. Wasn't the problem just temporary unavaialbility of all machines except dl01? Or maybe load-balancer for some reason always picked just dl01? Just trying to brainstorm here.

dl.fedoraproject.org is just DNS load balanced so you get whatever IP address that glibc chose out of the 3 for it. [I forgot that 4 and 5 are only seen by tier 1 mirrors] There is no load balancing system below that.

Have you considered switching mock configs used by Copr from metalink to baseurl? baseurl allows to specify multiple URLs per repo and would allow implementing load-balancing and fallback at mock config level.

Isn't dnf already trying multiple mirrors from the iist provided by mirror manager when mirror drops out during package download?

I would like to again point out that fixing things like this is not optimal because when things like this happen on Copr, it also probably happens on user machines somewhere during normal package installation.

I mean, it's better when Copr is working rather than not working. That's why I might want to apply such fixes in the very end.

It looks like dnf tried 10 times (per retries default of 10), but it never actually went to another mirror, it just kept trying to one mirror 10 times, which sounds like a dnf bug or a librepo bug or something.

That looks like a librepo bug (the rest of DNF has no idea about how fetching works...)

Any news here? Has this stopped?

I assume this has gone away... please reopen if you are still seeing anything or there is further for us to do.

:crocodile:

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

5 years ago

Login to comment on this ticket.

Metadata
Attachments 2
Attached 5 years ago View Comment
Attached 5 years ago View Comment