Learn more about these different git repos.
Other Git URLs
Traceback 1
[2021-04-29 05:16:17,745][ ERROR][PID:350511][copr_prune_results][copr_prune_results.py:run_prunerepo:69] Error pruning chroot omos/drop-selinux-disable/fedora-rawhide-aarch64 Traceback (most recent call last): File "/usr/bin/copr_prune_results.py", line 67, in run_prunerepo clean_copr(chroot_path, prune_days, verbose=True) File "/usr/bin/copr_prune_results.py", line 205, in clean_copr shutil.rmtree(dir_path) File "/usr/lib64/python3.9/shutil.py", line 718, in rmtree _rmtree_safe_fd(fd, path, onerror) File "/usr/lib64/python3.9/shutil.py", line 655, in _rmtree_safe_fd _rmtree_safe_fd(dirfd, fullname, onerror) File "/usr/lib64/python3.9/shutil.py", line 655, in _rmtree_safe_fd _rmtree_safe_fd(dirfd, fullname, onerror) File "/usr/lib64/python3.9/shutil.py", line 655, in _rmtree_safe_fd _rmtree_safe_fd(dirfd, fullname, onerror) [Previous line repeated 3 more times] File "/usr/lib64/python3.9/shutil.py", line 659, in _rmtree_safe_fd onerror(os.rmdir, fullname, sys.exc_info()) File "/usr/lib64/python3.9/shutil.py", line 657, in _rmtree_safe_fd os.rmdir(entry.name, dir_fd=topfd) PermissionError: [Errno 13] Permission denied: 'var'
Traceback 2
[2021-04-29 05:16:17,871][ ERROR][PID:350517][copr_prune_results][copr_prune_results.py:run_prunerepo:69] Error pruning chroot omos/selinux-policy-testing/fedora-34-x86_64 Traceback (most recent call last): File "/usr/bin/copr_prune_results.py", line 62, in run_prunerepo rpms = get_rpms_to_remove(chroot_path, days=prune_days, log=LOG) File "/usr/lib/python3.9/site-packages/prunerepo/helpers.py", line 194, in get_rpms_to_remove if time.time() - get_package_build_time(rpm, log) < days * 24 * 3600: File "/usr/lib/python3.9/site-packages/prunerepo/helpers.py", line 63, in get_package_build_time stdout = run_cmd(query_cmd, log) File "/usr/lib/python3.9/site-packages/prunerepo/helpers.py", line 54, in run_cmd raise PrunerepoException("Command {} failed".format(str_cmd)) prunerepo.helpers.PrunerepoException: Command /usr/bin/rpm -qp --queryformat %{BUILDTIME} /var/lib/copr/public_html/results/omos/selinux-policy-testing/fedora-34-x86_64/02010948-selinux-policy/selinux-policy-minimum-3.14.8-3.20210221_101800.210f301.fc34.noarch.rpm failed
Traceback 3:
[2021-04-29 05:17:27,195][ ERROR][PID:350518][copr_prune_results][copr_prune_results.py:run_prunerepo:69] Error pruning chroot aikidouke/yadm/fedora-rawhide-x86_64 Traceback (most recent call last): File "/usr/bin/copr_prune_results.py", line 62, in run_prunerepo rpms = get_rpms_to_remove(chroot_path, days=prune_days, log=LOG) File "/usr/lib/python3.9/site-packages/prunerepo/helpers.py", line 181, in get_rpms_to_remove latest_rpms = get_rpms(get_latest_packages_cmd, directory, log) File "/usr/lib/python3.9/site-packages/prunerepo/helpers.py", line 71, in get_rpms stdout = run_cmd(repoquery_cmd, log) # returns srpms as well File "/usr/lib/python3.9/site-packages/prunerepo/helpers.py", line 54, in run_cmd raise PrunerepoException("Command {} failed".format(str_cmd)) prunerepo.helpers.PrunerepoException: Command dnf repoquery --repofrompath=prunerepo_query,/var/lib/copr/public_html/results/aikidouke/yadm/fedora-rawhide-x86_64 --repo=prunerepo_query --refresh --queryformat=%{location} --quiet --setopt=skip_if_unavailable=False --latest-limit=1 failed
```
Tracaback 3 is because of what I wrote here https://pagure.io/copr/copr/pull-request/1796, when repoquery fail prunerepo log Error: Failed to download metadata for repo 'prunerepo_query': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried, but it is now just a debug print so in the log you can just see Error pruning chroot aikidouke/yadm/fedora-rawhide-x86_64
Error: Failed to download metadata for repo 'prunerepo_query': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
Error pruning chroot aikidouke/yadm/fedora-rawhide-x86_64
Traceback 2, it looks like it tried to get the build time of the package which wasn't there. That src.rpm was paired with the noarch.rpm file, even if it is no longer in that directory. So wrapping the call get_package_build_time (rpm, log) in try-except could help.
The traceback 3 is caused by entirely missing repodata, I think (some bug). Traceback is caused by removed RPMs, but the repodata still point at them... (some other bug). Traceback 1 is fixed by #1805.
What is a serious problem is the frequency of these issues: $ grep Traceback /var/log/copr-backend/pruner-full-log.log | wc -l 1812
It means that we have 1812 broken repositories :( and we need to find a way to fix them (re-running createrepo there?).
Ok, I think that Traceback 2 is already solved ... it looks like result of copr_prune_repo.py failures in previous releases (or older prunerepo failures). Before we a) removed the RPMs and b) then we run the createrepo_c (and appstream-builder, etc.).
The problem we see in Traceback 2 is successful RPM removal, but non-successful createrepo run (thus the old metadata are kept, pointing to already non-existing RPMs). The new code does this vice-versa.
Examples of traceback 3:
https://download.copr.fedorainfracloud.org/results/jenslody/codeblocks/opensuse-tumbleweed-x86_64/00768203-codeblocks/backend.log.gz (old code, rsync gave up too early) Another instances: https://download.copr.fedorainfracloud.org/results/avsej/zig/fedora-rawhide-i386/00772397-zig/backend.log.gz https://download.copr.fedorainfracloud.org/results/kwenning/pacemaker-2.0/epel-6-x86_64/00740185-pacemaker/backend.log.gz
https://download.copr.fedorainfracloud.org/results/till/cura:pr:8/fedora-33-x86_64/01640706-cura-fdm-materials/backend.log.gz (this is action processing failure) Another instances: https://download.copr.fedorainfracloud.org/results/bowlofeggs/bodhi/fedora-rawhide-aarch64/01903205-bodhi/backend.log.gz https://download.copr.fedorainfracloud.org/results/frostyx/copr-dev/fedora-32-x86_64/01600160-copr-backend/backend.log.gz https://download.copr.fedorainfracloud.org/results/jonny/Monero/fedora-32-x86_64/01696503-monero/backend.log.gz ...
EPEL 7 in https://copr.fedorainfracloud.org/coprs/willo/vagrant/builds/ dunno, but old build as hell, clean_copr() already removed leftovers, likely the first case.
The full list:
# grep "Command dnf repoquery" /var/log/copr-backend/pruner-full-log.log | sed 's/.*prunerepo_query,//' | sed 's/ .*//' | wc -l 385
Ful list of affected directories, with contents, is here: https://copr-be.cloud.fedoraproject.org/stats/issue-1804-traceback-3.log
I think the pattern is clear... the older builds are usually the case 1., the newer builds are 2.
So I think we should merge the related PRs, and release new copr-backend. This will avoid any future instances for TB1. TB2 is resolved now so shouldn't appear again. TB3 is something I haven't seen for some time, but I'm not sure what fixed it (the initial createrepo usually works nowadays), but we would have at least better logging with the recent patches..
Once we have deployed the new backend, we should recursively traverse all the result directories, and 1. fix file permissions causing TB1 2. submit a full createrepo action for all affected projects 3. run full copr_prune_results --no-mtime-optimzations 4. enable the cron job ... and watch residual problems in logs
Can anyone please take a look at the proposal?
Metadata Update from @praiskup: - Issue assigned to praiskup
Commit 9872e8e relates to this ticket
Commit 60a7edf relates to this ticket
Can anyone please take a look at the proposal above?
I see no problem with your proposal :-)
Commit e3ad541 relates to this ticket
2) step blocked, we need to apply another pr #1813.... I'll continue later. In the meantime, I believe the permissions from 1. will be fixed.
2) step currently in progress: https://copr-be.cloud.fedoraproject.org/stats/recreate-repo-2021-05-02.log
2) step continues here: https://copr-be.cloud.fedoraproject.org/stats/recreate-repo-2021-05-03.log
3) step 3 tried, though it ended with:
[2021-05-03 15:57:28,346][ ERROR][PID:2941109][copr_prune_results][copr_prune_results.py:run_prunerepo:69] Error pruning chroot @python/python3.10/fedora-rawhide-x86_64 Traceback (most recent call last): File "/usr/bin/copr_prune_results.py", line 65, in run_prunerepo call_copr_repo(directory=chroot_path, rpms_to_remove=rpms, File "/usr/lib/python3.9/site-packages/copr_backend/helpers.py", line 641, in call_copr_repo result = run_cmd(cmd, timeout=timeout, logger=logger, catch_timeout=True) File "/usr/lib/python3.9/site-packages/copr_backend/helpers.py", line 91, in run_cmd process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=shell, encoding="utf-8") File "/usr/lib64/python3.9/subprocess.py", line 951, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/usr/lib64/python3.9/subprocess.py", line 1821, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) OSError: [Errno 7] Argument list too long: 'copr-repo'
I've run prunerepo and createrepo manually against that large project, to miminimize the command length. The next prune repo run took:
# date ; time runuser -c "PYTHONUNBUFFERED=1 python3 /usr/bin/copr_prune_results.py --no-mtime-optimization &> /dev/null" - copr Mon May 3 08:29:52 PM UTC 2021 real 399m31.577s user 463m17.717s sys 107m3.633s
Without --no-mtime-optimization it will be even faster.
The next daily runs, at least for now, but I created #1817 that we should resolve soon.
Metadata Update from @praiskup: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.