#964 Detect builds that failed due to network-related issues and retry them automatically
Closed: MIGRATED a year ago by nikromen. Opened 4 years ago by iucar.

This is not easy, because it probably requires some alchemy with the logs, but given my (short) experience with Copr, this is very high in my wish list.

The thing is, quite frequently, builds fail due to a sporadic network-related problem: unable to download the sources, unable to query repo metadata from official Fedora repos, unable to download a dependency, some weird error from the HTTP Python stack... whatever. You just resubmit the build and everything works.

So it would be really really great if the Copr backend could detect these issues and retry these builds automatically for us (say, max 3 times?). Note that I said "retry" (with the same build ID), not "submit another build". ;-) It's very cheap (we have to do it anyway), and it would save us from a lot of headaches.


This is not easy, because it probably requires some alchemy with the logs, but given my (short) experience with Copr, this is very high in my wish list.

The problem is that copr suffers from librepo issues:
https://pagure.io/fedora-infrastructure/issue/7987
https://bugzilla.redhat.com/1741931
https://pagure.io/fedora-infrastructure/issue/7301

The thing is, quite frequently, builds fail due to a sporadic network-related problem: unable to download the sources, unable to query repo metadata from official Fedora repos, unable to download a dependency, some weird error from the HTTP Python stack... whatever. You just resubmit the build and everything works.

Fair list, can you provide examples for "unable to download the sources", and "unable to download a dependency", and "HTTP Python stack" problem?

So it would be really really great if the Copr backend could detect these issues and retry these builds automatically for us (say, max 3 times?). Note that I said "retry" (with the same build ID), not "submit another build". ;-) It's very cheap (we have to do it anyway), and it would save us from a lot of headaches.

Hmm, we need to decide this on our team meeting. But it would be much much better if we got librepo into better shape, and fixed the rest of network related issues you mention. The (really hard to implement) is something which should come as last resort.

Metadata Update from @praiskup:
- Issue tagged with: RFE

4 years ago

Ok, the next time I see one of these failures, I'll save the log and report it here.

For example, this is a good candidate that is worth a retry. Because I've just tested locally and I got the sources just fine, so it was probably some temporary problem with the remote CRAN mirror, a glitch in the Matrix.

Copying the issue here:

cmd: ['rpkg', 'srpm', '--outdir', '/var/lib/copr-rpmbuild/resultskolhne4b', '--spec', '/tmp/tmpn8e3yyn_/R-CRAN-ClinReport.spec']
cwd: /tmp/tmpn8e3yyn_
rc: 0
stdout: Wrote: /var/lib/copr-rpmbuild/resultskolhne4b/R-CRAN-ClinReport.spec
stderr: warning: Downloading https://cran.r-project.org/package=ClinReport&version=0.9.1.14#/ClinReport_0.9.1.14.tar.gz to /var/lib/copr-rpmbuild/resultskolhne4b/ClinReport_0.9.1.14.tar.gz
curl: (28) Operation timed out after 300823 milliseconds with 0 out of 0 bytes received
error: Couldn't download https://cran.r-project.org/package=ClinReport&version=0.9.1.14#/ClinReport_0.9.1.14.tar.gz
Failed to execute command.

Another different one: 1028093. Copying the relevant lines:

Transaction Summary
================================================================================
Install  150 Packages

Total download size: 75 M
Installed size: 341 M
Downloading Packages:

Error: Error downloading packages:
  Status code: 503 for https://mirrors.fedoraproject.org/metalink?repo=fedora-30&arch=x86_64
INFO: chroot_scan: 3 files copied to /var/lib/copr-rpmbuild/results/chroot_scan
INFO: /var/lib/mock/1028093-fedora-30-x86_64-1567609468.560634/root/var/log/dnf.rpm.log
/var/lib/mock/1028093-fedora-30-x86_64-1567609468.560634/root/var/log/dnf.librepo.log
/var/lib/mock/1028093-fedora-30-x86_64-1567609468.560634/root/var/log/dnf.log
ERROR: Exception(/tmp/tmp5fat9p1u/R-CRAN-ncmeta.spec) Config(1028093-fedora-30-x86_64) 1 minutes 24 seconds
INFO: Results and/or logs in: /var/lib/copr-rpmbuild/results
INFO: Cleaning up build root ('cleanup_on_failure=True')
Start: clean chroot
INFO: unmounting tmpfs.
Finish: clean chroot
ERROR: Command failed: 
 # /usr/bin/dnf --installroot /var/lib/mock/1028093-fedora-30-x86_64-1567609468.560634/root/ --releasever 30 --setopt=deltarpm=False --disableplugin=local --disableplugin=spacewalk install @buildsys-build

Another different one: 1028093.

This is the librepo problem, see https://bugzilla.redhat.com/show_bug.cgi?id=1741931

Is 1028439 the same issue? Relevant lines:

Copr repository                                  38 kB/s | 3.0 kB     00:00    
fedora                                           58 kB/s |  19 kB     00:00    
updates                                         0.0  B/s |   0  B     00:00    
Failed to download metadata for repo 'updates'
Error: Failed to download metadata for repo 'updates'
INFO: chroot_scan: 3 files copied to /var/lib/copr-rpmbuild/results/chroot_scan
INFO: /var/lib/mock/1028439-fedora-30-x86_64-1567666072.561135/root/var/log/dnf.log
/var/lib/mock/1028439-fedora-30-x86_64-1567666072.561135/root/var/log/dnf.librepo.log
/var/lib/mock/1028439-fedora-30-x86_64-1567666072.561135/root/var/log/dnf.rpm.log
ERROR: Exception(/var/lib/copr-rpmbuild/results/R-CRAN-markovchain-0.7.0-1.fc30.src.rpm) Config(1028439-fedora-30-x86_64) 0 minutes 15 seconds
INFO: Results and/or logs in: /var/lib/copr-rpmbuild/results
INFO: Cleaning up build root ('cleanup_on_failure=True')
Start: clean chroot
INFO: unmounting tmpfs.
Finish: clean chroot
ERROR: Command failed: 
 # /usr/bin/dnf builddep --installroot /var/lib/mock/1028439-fedora-30-x86_64-1567666072.561135/root/ --releasever 30 --setopt=deltarpm=False --disableplugin=local --disableplugin=spacewalk --disableplugin=local --disableplugin=spacewalk /var/lib/mock/1028439-fedora-30-x86_64-1567666072.561135/root//builddir/build/SRPMS/R-CRAN-markovchain-0.7.0-1.fc30.src.rpm

Hmmm Curl error (16): Error in the HTTP2 framing layer, it is different, see dnf.log. This is https://bugzilla.redhat.com/show_bug.cgi?id=1690971 (we need to do explicit image update for this probably)

The higher level "retry" is going to be resolved in mock, if any https://github.com/rpm-software-management/mock/issues

Metadata Update from @praiskup:
- Issue assigned to praiskup

4 years ago

Metadata Update from @praiskup:
- Issue marked as depending on: #1033

4 years ago

Metadata Update from @praiskup:
- Assignee reset

3 years ago

Metadata Update from @nikromen:
- Issue close_status updated to: MIGRATED
- Issue status updated to: Closed (was: Open)

a year ago

Login to comment on this ticket.

Metadata