Learn more about these different git repos.
Other Git URLs
This could well be a misconfiguration or my misunderstanding how things work. :)
deleted_repo_lifetime is by default 1 week, but it's unclear to me the meaning:
This time (in seconds) is uses to clean up expired repositories. It makes sense, that you don't want only the latest repodata. In case of debugging some older build. Even in case you don't want to do this, it is recommended to set it at least to few hours in case, there is some running build, which is still using these older data.
So, deleted repos (where the tag/target are removed) is removed immediately? Or waits 1 week? seems like it starts deleting immediately, but possibly only week old ones?
For active tag/targets, are all repos kept as long as it's active? Or repos older than 1 week are deleted?
For example, we have in fedora land: f32-build (the target/tag for building for fedora 32).
There are 1638 repos in /mnt/koji/repos/f32-build They currently take up 1.3T space
The 10 oldest repos in there are:
drwxr-xr-x. 9 apache apache 4096 Mar 2 11:26 1392163 drwxr-xr-x. 9 apache apache 4096 Mar 2 09:15 1391945 drwxr-xr-x. 9 apache apache 4096 Mar 2 08:43 1391860 drwxr-xr-x. 9 apache apache 4096 Mar 2 08:38 1391845 drwxr-xr-x. 9 apache apache 4096 Mar 2 08:03 1391745 drwxr-xr-x. 9 apache apache 4096 Mar 2 07:14 1391677 drwxr-xr-x. 9 apache apache 4096 Mar 2 05:17 1391480 drwxr-xr-x. 9 apache apache 4096 Mar 2 01:54 1391332 drwxr-xr-x. 10 apache apache 4096 Jan 19 19:39 1339092 drwxr-xr-x. 10 apache apache 4096 Dec 30 18:06 1326435
So, ideally I would think we would only keep a week of those repos on disk?
Also, we have side tag repos that are long lived and doing the same (although these are due to either bodhi bugs (it was merged and should have been deleted) or user forgetting about the side tag (would be fixed by removing old sitetags).
Repo could be in EXPIRED or DELETED state. DELETED means that it is really missing from disk (final state). EXPIRED means, that it is in queue to be deleted. delete_repo_lifetime says, how long it will wait in this queue.
delete_repo_lifetime
For deleted tag repo is not deleted immediately. kojira periodacally looks to repos dir and got through all of them. If some of them is missing the tag (it was deleted meanwhile) that repo moves to EXPIRED immediately.
To actual repos you've pasted there - they probably should be deleted already. Is there anything in the kojira.log about them? (Just run kojira with --debug grep any of those repo IDs) .
Maybe - if you restart kojira more often than deleted_repo_lifetime you can never hit that threshold. #2139 is related to this as there is for safety reasons this minimal time. It would behave a bit differently in 1.22.
Metadata Update from @tkopecek: - Custom field Size adjusted to None
So:
# journalctl -b -l -u kojira | grep EXPIRED | wc -l 115387
None of those f32-build ones are in the list however.
Is it safe to stop kojira and remove expired repos and restart it? Or will that cause it problems?
Is it safe to stop kojira and remove expired repos and restart it?
The catch is that kojira incorporates a first_seen timestamp into its age calculation. When you restart kojira, you reset all the first_seen timestamps. This is very old behavior to work around some limitations of the data available to kojira.
first_seen
This behavior was very recently changed in #2174. That change should prevent first_seen from extending the lives of repos except in rare cases. If you were to apply that change, then restarting kojira would not be a problem.
Tomas was suggesting that one possible way you could see repos sticking around like this is if you were restarting kojira regularly. So, are you doing that in Fedora infra? If so, then you should either stop or apply #2174.
So, on looking at debug we found a bunch of:
Jun 16 15:30:28 koji02.iad2.fedoraproject.org kojira[3196057]: 2020-06-16 15:30:28,437 [DEBUG] {3196057} koji.repo.regen:574 /mnt/koji/repos/f33-infra-build/1543341 not a directory, skipping
looks like a test was missing a 'not'. See pr https://pagure.io/koji/pull-request/2323
adding that (and the #2139 patch) and restarting and now it's queuing tons of rmtree jobs.
Will keep watching.
not regularly, but somewhat often: datacenter move, upgrading to 1.21.1, etc.
So, a few more questions...
What does kojira mean by:
[DEBUG] {747269} koji.repo.regen:438 Skipped check for repos: {540, 780, 1599744, 1599745, 1599746, 1599747,... <pages and pages more repos>
It looks like the intent is that it will gradually check these over time? But it's not clear to me that it ever does...
Some of the things are done in threads, but the rmtree's are not? It's a bit confusing, perhaps the debug logging could always preface with what thread it's logging from?
Changing the delete_batch_size to something large like 500, seems to cause it to do a number of rmtrees when starting, but then in the loop it only seems to do 2-3 or so at a time, and we still have a ton of old expired repos. ;( Perhaps the deletes could be also done in a thread? or perhaps a stand alone tool to do removals would be possible?
[DEBUG] {747269} koji.repo.regen:287 Tracking 46613 repos, 5 child processes
Our f33-build repo is still 1.3TB... and has repos back 13 days ( but deleted_repo_lifetime is the default, 1 week). Definitely an improvement over the 4TB it used to be. :)
@kevin It means, that there is no point in checking that repo again (in most cases we already know that is expired and we don't need to check its validity again)
Using separate thread for makes sense - created #2336
Could we run a stripped down kojira that just does deletes here to get us back on track?
It doesn't seem to be deleting much/very fast at all:
journalctl -l -u kojira -S 'Wed 2020-06-24 19:18:57 UTC'| grep rmtree | wc -l 88
This might help: https://pagure.io/koji-tools/pull-request/50
Issue #2336 is closed. Is this one still relevant? If so, could we clarify what remains?
If you are still seeing issues, you might check to see if you're hitting #2714
This issue seems fixed now (its processing / keeping up).
I do still see a few tracebacks:
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: 2021-02-22 18:17:20,161 [INFO] {779312} koji.repo.rege n:307 Queuing rmtree job for /mnt/koji/repos/f34-build-side-36867/3024078 Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: 2021-02-22 18:17:20,161 [ERROR] {779312} koji.repo.del ete:554 Error in delete thread Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: Traceback (most recent call last): Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/sbin/kojira", line 551, in deleteLoop Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: self.deleteRepos() Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/sbin/kojira", line 961, in deleteRepos Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: del self.repos[repo.repo_id] Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: KeyError: 3024078 Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: Exception in thread deleteLoop: Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: Traceback (most recent call last): Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/lib64/python3.9/threading.py", line 954, in _bootstrap_inner Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: self.run() Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/lib64/python3.9/threading.py", line 892, in run Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: self._target(*self._args, **self._kwargs) Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/sbin/kojira", line 551, in deleteLoop Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: self.deleteRepos() Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/sbin/kojira", line 961, in deleteRepos Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: del self.repos[repo.repo_id] Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: KeyError: 3024078
Feb 22 18:11:42 koji02.iad2.fedoraproject.org kojira[779312]: 2021-02-22 18:11:42,998 [INFO] {779312} koji.repo.rege n:307 Queuing rmtree job for /mnt/koji/repos/f34-build-side-36717/3024055 Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: 2021-02-22 18:11:43,001 [ERROR] {779312} koji:1057 Tra ceback (most recent call last): Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/lib/python3.9/site-packages/koji/__init__ .py", line 2754, in _callMethod Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: return self._sendCall(handler, headers, request) Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/lib/python3.9/site-packages/koji/__init__ .py", line 2672, in _sendCall Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: return self._sendOneCall(handler, headers, request ) Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/lib/python3.9/site-packages/koji/__init__ .py", line 2718, in _sendOneCall Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: ret = self._read_xmlrpc_response(r) Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/lib/python3.9/site-packages/koji/__init__ .py", line 2730, in _read_xmlrpc_response Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: result = u.close() Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/lib64/python3.9/xmlrpc/client.py", line 6 62, in close Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: raise Fault(**self._stack[0]) Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: xmlrpc.client.Fault: <Fault 1008: '1930653 > 1930652 ( session 128038807)'> Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: During handling of the above exception, another except ion occurred: Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: Traceback (most recent call last): Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/sbin/kojira", line 1028, in main Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: repomgr.updateRepos() Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/sbin/kojira", line 722, in updateRepos Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: self.readCurrentRepos() Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/sbin/kojira", line 386, in readCurrentRep os Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: if not getTag(self.session, repo.tag_id) and not r epo.expired(): Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/sbin/kojira", line 55, in getTag Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: info = session.getTag(tag, event=event) Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/lib/python3.9/site-packages/koji/__init__ .py", line 2346, in __call__ Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: return self.__func(self.__name, args, opts) Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: File "/usr/lib/python3.9/site-packages/koji/__init__ .py", line 2773, in _callMethod Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: raise err Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: koji.SequenceError: 1930653 > 1930652 (session 1280388 07)
but perhaps we should close this now and file those as sperate issues? Happy to do whatever.
Login to comment on this ticket.