Learn more about these different git repos.
Other Git URLs
We seem to have hit another repo deletes problem with kojira in 1.24 and 1.24.1. Moving back to the version in 1.23.1 seems to avoid it.
We just noticed the problem today when it was noted that we were again almost out of space on our koji volume. ;( On looking, there was 47TB or so of repos still around in repos/
Logs seemed to show some tracebacks:
May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: Exception in thread rmtreeLoop: May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: Traceback (most recent call last): May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: File "/usr/sbin/kojira", line 600, in rmt reeLoop May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: self.checkQueue() May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: File "/usr/sbin/kojira", line 352, in che ckQueue May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: pid = self._rmtree(path) May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: File "/usr/sbin/kojira", line 375, in _rm tree May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: pid = os.fork() May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: File "/usr/sbin/kojira", line 278, in __e xit__ May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: self.release_locks() May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: File "/usr/sbin/kojira", line 300, in rel ease_locks May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: logging._releaseLock() May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: File "/usr/lib64/python3.9/logging/__init __.py", line 232, in _releaseLock May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: _lock.release() May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: RuntimeError: cannot release un-acquired lo ck May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: During handling of the above exception, ano ther exception occurred: May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: Traceback (most recent call last): May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: File "/usr/lib64/python3.9/threading.py", line 954, in _bootstrap_inner May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: self.run() May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: File "/usr/lib64/python3.9/threading.py", line 892, in run May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: self._target(*self._args, **self._kwarg s) May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: File "/usr/sbin/kojira", line 606, in rmt reeLoop May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: session.logout() May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: File "/usr/lib/python3.9/site-packages/ko ji/__init__.py", line 2560, in logout May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: self._sendCall(handler, headers, reques t) May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: File "/usr/lib/python3.9/site-packages/koji/__init__.py", line 2636, in _sendCall May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: return self._sendOneCall(handler, headers, request) May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: File "/usr/lib/python3.9/site-packages/koji/__init__.py", line 2685, in _sendOneCall May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: ret = self._read_xmlrpc_response(r) May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: File "/usr/lib/python3.9/site-packages/koji/__init__.py", line 2697, in _read_xmlrpc_response May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: result = u.close() May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: File "/usr/lib64/python3.9/xmlrpc/client.py", line 662, in close May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: raise Fault(**self._stack[0]) May 10 21:56:34 koji02.iad2.fedoraproject.org kojira[1314442]: xmlrpc.client.Fault: <Fault 1007: 'session "135545223" has expired'>
and
May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: 2021-05-08 23:59:20,265 [INFO] {805} koji.repo. regen:340 Queuing rmtree job for /mnt/koji/repos/f35-build-side-40484/3472506 May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: 2021-05-08 23:59:20,269 [ERROR] {805} koji:1091 Traceback (most recent call last): May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: File "/usr/lib/python3.9/site-packages/koji/_ _init__.py", line 2721, in _callMethod May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: return self._sendCall(handler, headers, req uest) May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: File "/usr/lib/python3.9/site-packages/koji/_ _init__.py", line 2636, in _sendCall May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: return self._sendOneCall(handler, headers, request) May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: File "/usr/lib/python3.9/site-packages/koji/_ _init__.py", line 2685, in _sendOneCall May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: ret = self._read_xmlrpc_response(r) May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: File "/usr/lib/python3.9/site-packages/koji/_ _init__.py", line 2697, in _read_xmlrpc_response May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: result = u.close() May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: File "/usr/lib64/python3.9/xmlrpc/client.py", line 662, in close May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: raise Fault(**self._stack[0]) May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: xmlrpc.client.Fault: <Fault 1008: '17693841 > 1 7693840 (session 134841685)'> May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: During handling of the above exception, another exception occurred: May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: Traceback (most recent call last): May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: File "/usr/sbin/kojira", line 1062, in main May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: repomgr.updateRepos() May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: File "/usr/sbin/kojira", line 752, in updateR epos May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: self.readCurrentRepos() May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: File "/usr/sbin/kojira", line 421, in readCur rentRepos May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: if not getTag(self.session, repo.tag_id) an d not repo.expired(): May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: File "/usr/sbin/kojira", line 54, in getTag May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: info = session.getTag(tag, event=event) May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: File "/usr/lib/python3.9/site-packages/koji/_ _init__.py", line 2310, in __call__ May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: return self.__func(self.__name, args, opts) May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: File "/usr/lib/python3.9/site-packages/koji/_ _init__.py", line 2740, in _callMethod May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: raise err May 08 23:59:20 koji02.iad2.fedoraproject.org kojira[805]: koji.SequenceError: 17693841 > 17693840 (sessio n 134841685)
It's unclear if these have anything to do with things though.
Another one seems to clearly show:
May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293341]: 2021-05-10 17:55:53,732 [INFO] {1293341} ko ji.repo.regen:353 Started rmtree (pid 1293352) for /mnt/koji/repos/epel7-infra-mailman/3449468 May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293352]: 2021-05-10 17:55:53,733 [ERROR] {1293352} k oji.repo.rmtree:603 Error in delete thread May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293352]: Traceback (most recent call last): May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293352]: File "/usr/sbin/kojira", line 600, in rmt reeLoop May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293352]: self.checkQueue() May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293352]: File "/usr/sbin/kojira", line 352, in che ckQueue May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293352]: pid = self._rmtree(path) May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293352]: File "/usr/sbin/kojira", line 375, in _rm tree May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293352]: pid = os.fork() May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293352]: File "/usr/sbin/kojira", line 278, in __e xit__ May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293352]: self.release_locks() May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293352]: File "/usr/sbin/kojira", line 300, in rel ease_locks May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293352]: logging._releaseLock() May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293352]: File "/usr/lib64/python3.9/logging/__init __.py", line 232, in _releaseLock May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293352]: _lock.release() May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293352]: RuntimeError: cannot release un-acquired lo ck May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293353]: 2021-05-10 17:55:53,743 [ERROR] {1293353} k oji.repo.rmtree:603 Error in delete thread May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293353]: Traceback (most recent call last): May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293353]: File "/usr/sbin/kojira", line 600, in rmt reeLoop May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293353]: self.checkQueue() May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293353]: File "/usr/sbin/kojira", line 352, in che ckQueue May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293353]: pid = self._rmtree(path) May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293353]: File "/usr/sbin/kojira", line 375, in _rm tree May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293353]: pid = os.fork() May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293353]: File "/usr/sbin/kojira", line 278, in __e xit__ May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293353]: self.release_locks() May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293353]: File "/usr/sbin/kojira", line 300, in rel ease_locks May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293353]: logging._releaseLock() May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293353]: File "/usr/lib64/python3.9/logging/__init __.py", line 232, in _releaseLock May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293353]: _lock.release() May 10 17:55:53 koji02.iad2.fedoraproject.org kojira[1293353]: RuntimeError: cannot release un-acquired lo ck
The 1.23.1 kojira sees tons of:
May 10 22:42:18 koji02.iad2.fedoraproject.org kojira[1316276]: 2021-05-10 22:42:18,181 [INFO] {1316276} koji:650 Removing stray repo (state=DELETED): /mnt/koji/repos/module-nginx-mainline-3220210331001209-43bbeeef-build/3452529
So, it was marked deleted, added to rmqueue, but never actually deleted?
Happy to provide more info / kojira --debug, but we need to get the space usage down before we run out, so I figured I'd file this sooner rather than later.
Another one with 1.23.1:
May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: 2021-05-10 23:58:10,263 [INFO] {1316276} ko ji.repo.regen:307 Queuing rmtree job for /mnt/koji/repos/eln-build/3485072 May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: 2021-05-10 23:58:10,263 [ERROR] {1316276} k oji.repo.delete:554 Error in delete thread May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: Traceback (most recent call last): May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: File "/usr/sbin/kojira", line 551, in del eteLoop May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: self.deleteRepos() May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: File "/usr/sbin/kojira", line 961, in del eteRepos May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: del self.repos[repo.repo_id] May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: KeyError: 3485072 May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: Exception in thread deleteLoop: May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: Traceback (most recent call last): May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: File "/usr/lib64/python3.9/threading.py", line 954, in _bootstrap_inner May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: self.run() May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: File "/usr/lib64/python3.9/threading.py", line 892, in run May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: self._target(*self._args, **self._kwarg s) May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: File "/usr/sbin/kojira", line 551, in del eteLoop May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: self.deleteRepos() May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: File "/usr/sbin/kojira", line 961, in del eteRepos May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: del self.repos[repo.repo_id] May 10 23:58:10 koji02.iad2.fedoraproject.org kojira[1316276]: KeyError: 3485072
Metadata Update from @kevin: - Custom field Size adjusted to None
It seems to be two issues: 1) logging locking (side-effect of #2714) - what is your python version? + @mikem https://pagure.io/koji/pull-request/2715#_1__51 maybe this should encompass whole forked code until the end of _rmtree?
_rmtree
2) race with del self.repos - #2853 should fix this
del self.repos
Initial traceback, cleaned up
Traceback (most recent call last): File "/usr/sbin/kojira", line 600, in rmtreeLoop self.checkQueue() File "/usr/sbin/kojira", line 352, in checkQueue pid = self._rmtree(path) File "/usr/sbin/kojira", line 375, in _rmtree pid = os.fork() File "/usr/sbin/kojira", line 278, in __exit__ self.release_locks() File "/usr/sbin/kojira", line 300, in release_locks logging._releaseLock() File "/usr/lib64/python3.9/logging/__init__.py", line 232, in _releaseLock _lock.release() RuntimeError: cannot release un-acquired lock
Hmm, we only try to do this is the earlier logging._acquireLock() succeeds. That's odd
The SequenceError is its own issue. I noticed it myself the other week and I've just filed #2854 for track it.
https://pagure.io/koji/issue/2854
maybe this should encompass whole forked code until the end of _rmtree?
I don't see how this will help. Once we've forked, we're the only thread.
@kevin If you can give me details about the system where kojira is running, I can try to replicate. OS and python version.
Also, were you running stock koji 1.24?
It's Fedora 34, so python-3.9.4.
We were running koji 1.24.1... but I think the issue started with 1.24, we just didn't notice.
The 1.23.1 kojira version has now cleaned things all out, so I can re-upgrade and/or run with debug if you like.
I've a simple reproducer. https://paste.centos.org/view/c5719b07 Child process will always have this problem. (F33/python-3.9.4)
After the fork, child's lock is not acquired. It is just reinitialized. So, all locks in the child process are free. So, release_locks would always work only in parent. We could just wrap all releases with try/except. Anything better (detecting that we're child somehow?)?
release_locks
https://github.com/python/cpython/pull/19195/files#diff-23071700ee893f365a4c117210b9129d4ac6afe5cc6e52bb5cd4cabbc6816e07R498
PR #2856
Metadata Update from @tkopecek: - Issue set to the milestone: 1.25
Ah, the reproducer does not reproduce on f32, but errors on f33 and 34. This may be a fix the underlying problem in newer python that is breaking the workaround.
f32 is the last one with py 3.8 as default. so, _acquireLock will not be called there.
_acquireLock
Metadata Update from @jcupova: - Issue tagged with: testing-ready
Metadata Update from @mfilip: - Issue tagged with: testing-done
Commit 98f81d6 fixes this issue
Commit 3849e07 relates to this ticket
Commit f912799 fixes this issue
Commit 11e71e8 relates to this ticket
Login to comment on this ticket.