#2314 kojira doesn't seem to be deleting old repos
Opened 3 years ago by kevin. Modified 3 years ago

This could well be a misconfiguration or my misunderstanding how things work. :)

deleted_repo_lifetime is by default 1 week, but it's unclear to me the meaning:

    This time (in seconds) is uses to clean up expired repositories.
    It makes sense, that you don't want only the latest repodata. In
    case of debugging some older build. Even in case you don't want to
    do this, it is recommended to set it at least to few hours in
    case, there is some running build, which is still using these
    older data.

So, deleted repos (where the tag/target are removed) is removed immediately? Or waits 1 week? seems like it starts deleting immediately, but possibly only week old ones?

For active tag/targets, are all repos kept as long as it's active? Or repos older than 1 week are deleted?

For example, we have in fedora land: f32-build (the target/tag for building for fedora 32).

There are 1638 repos in /mnt/koji/repos/f32-build
They currently take up 1.3T space

The 10 oldest repos in there are:

drwxr-xr-x.     9 apache apache    4096 Mar  2 11:26 1392163
drwxr-xr-x.     9 apache apache    4096 Mar  2 09:15 1391945
drwxr-xr-x.     9 apache apache    4096 Mar  2 08:43 1391860
drwxr-xr-x.     9 apache apache    4096 Mar  2 08:38 1391845
drwxr-xr-x.     9 apache apache    4096 Mar  2 08:03 1391745
drwxr-xr-x.     9 apache apache    4096 Mar  2 07:14 1391677
drwxr-xr-x.     9 apache apache    4096 Mar  2 05:17 1391480
drwxr-xr-x.     9 apache apache    4096 Mar  2 01:54 1391332
drwxr-xr-x.    10 apache apache    4096 Jan 19 19:39 1339092
drwxr-xr-x.    10 apache apache    4096 Dec 30 18:06 1326435

So, ideally I would think we would only keep a week of those repos on disk?

Also, we have side tag repos that are long lived and doing the same (although these are due to either bodhi bugs (it was merged and should have been deleted) or user forgetting about the side tag (would be fixed by removing old sitetags).


Repo could be in EXPIRED or DELETED state. DELETED means that it is really missing from disk (final state). EXPIRED means, that it is in queue to be deleted. delete_repo_lifetime says, how long it will wait in this queue.

For deleted tag repo is not deleted immediately. kojira periodacally looks to repos dir and got through all of them. If some of them is missing the tag (it was deleted meanwhile) that repo moves to EXPIRED immediately.

To actual repos you've pasted there - they probably should be deleted already. Is there anything in the kojira.log about them? (Just run kojira with --debug grep any of those repo IDs) .

Maybe - if you restart kojira more often than deleted_repo_lifetime you can never hit that threshold. #2139 is related to this as there is for safety reasons this minimal time. It would behave a bit differently in 1.22.

Metadata Update from @tkopecek:
- Custom field Size adjusted to None

3 years ago

So:

# journalctl -b -l -u kojira | grep EXPIRED | wc -l
115387

None of those f32-build ones are in the list however.

Is it safe to stop kojira and remove expired repos and restart it? Or will that cause it problems?

Is it safe to stop kojira and remove expired repos and restart it?

The catch is that kojira incorporates a first_seen timestamp into its age calculation. When you restart kojira, you reset all the first_seen timestamps. This is very old behavior to work around some limitations of the data available to kojira.

This behavior was very recently changed in #2174. That change should prevent first_seen from extending the lives of repos except in rare cases. If you were to apply that change, then restarting kojira would not be a problem.

Tomas was suggesting that one possible way you could see repos sticking around like this is if you were restarting kojira regularly. So, are you doing that in Fedora infra? If so, then you should either stop or apply #2174.

So, on looking at debug we found a bunch of:

Jun 16 15:30:28 koji02.iad2.fedoraproject.org kojira[3196057]: 2020-06-16 15:30:28,437 [DEBUG] {3196057} koji.repo.regen:574 /mnt/koji/repos/f33-infra-build/1543341 not a directory, skipping

looks like a test was missing a 'not'. See pr https://pagure.io/koji/pull-request/2323

adding that (and the #2139 patch) and restarting and now it's queuing tons of rmtree jobs.

Will keep watching.

Tomas was suggesting that one possible way you could see repos sticking around like this is if you were restarting kojira regularly. So, are you doing that in Fedora infra? If so, then you should either stop or apply #2174.

not regularly, but somewhat often: datacenter move, upgrading to 1.21.1, etc.

So, a few more questions...

What does kojira mean by:

[DEBUG] {747269} koji.repo.regen:438 Skipped check for repos: {540, 780, 1599744, 1599745, 1599746, 1599747,... <pages and pages more repos>

It looks like the intent is that it will gradually check these over time? But it's not clear to me that it ever does...

Some of the things are done in threads, but the rmtree's are not? It's a bit confusing, perhaps the debug logging could always preface with what thread it's logging from?

Changing the delete_batch_size to something large like 500, seems to cause it to do a number of rmtrees when starting, but then in the loop it only seems to do 2-3 or so at a time, and we still have a ton of old expired repos. ;(
Perhaps the deletes could be also done in a thread? or perhaps a stand alone tool to do removals would be possible?

[DEBUG] {747269} koji.repo.regen:287 Tracking 46613 repos, 5 child processes

Our f33-build repo is still 1.3TB... and has repos back 13 days ( but deleted_repo_lifetime is the default, 1 week). Definitely an improvement over the 4TB it used to be. :)

@kevin It means, that there is no point in checking that repo again (in most cases we already know that is expired and we don't need to check its validity again)

Using separate thread for makes sense - created #2336

Could we run a stripped down kojira that just does deletes here to get us back on track?

It doesn't seem to be deleting much/very fast at all:

journalctl -l -u kojira -S 'Wed 2020-06-24 19:18:57 UTC'| grep rmtree | wc -l
88

Could we run a stripped down kojira that just does deletes here to get us back on track?

This might help: https://pagure.io/koji-tools/pull-request/50

Issue #2336 is closed. Is this one still relevant? If so, could we clarify what remains?

If you are still seeing issues, you might check to see if you're hitting #2714

This issue seems fixed now (its processing / keeping up).

I do still see a few tracebacks:

Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: 2021-02-22 18:17:20,161 [INFO] {779312} koji.repo.rege
n:307 Queuing rmtree job for /mnt/koji/repos/f34-build-side-36867/3024078
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: 2021-02-22 18:17:20,161 [ERROR] {779312} koji.repo.del
ete:554 Error in delete thread
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: Traceback (most recent call last):
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/sbin/kojira", line 551, in deleteLoop
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]:     self.deleteRepos()
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/sbin/kojira", line 961, in deleteRepos
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]:     del self.repos[repo.repo_id]
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: KeyError: 3024078
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: Exception in thread deleteLoop:
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: Traceback (most recent call last):
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/lib64/python3.9/threading.py", line 954, in _bootstrap_inner
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]:     self.run()
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/lib64/python3.9/threading.py", line 892, in run
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]:     self._target(*self._args, **self._kwargs)
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/sbin/kojira", line 551, in deleteLoop
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]:     self.deleteRepos()
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/sbin/kojira", line 961, in deleteRepos
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]:     del self.repos[repo.repo_id]
Feb 22 18:17:20 koji02.iad2.fedoraproject.org kojira[779312]: KeyError: 3024078
Feb 22 18:11:42 koji02.iad2.fedoraproject.org kojira[779312]: 2021-02-22 18:11:42,998 [INFO] {779312} koji.repo.rege
n:307 Queuing rmtree job for /mnt/koji/repos/f34-build-side-36717/3024055
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: 2021-02-22 18:11:43,001 [ERROR] {779312} koji:1057 Tra
ceback (most recent call last):
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/lib/python3.9/site-packages/koji/__init__
.py", line 2754, in _callMethod
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:     return self._sendCall(handler, headers, request)
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/lib/python3.9/site-packages/koji/__init__
.py", line 2672, in _sendCall
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:     return self._sendOneCall(handler, headers, request
)
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/lib/python3.9/site-packages/koji/__init__
.py", line 2718, in _sendOneCall
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:     ret = self._read_xmlrpc_response(r)
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/lib/python3.9/site-packages/koji/__init__
.py", line 2730, in _read_xmlrpc_response
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:     result = u.close()
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/lib64/python3.9/xmlrpc/client.py", line 6
62, in close
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:     raise Fault(**self._stack[0])
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: xmlrpc.client.Fault: <Fault 1008: '1930653 > 1930652 (
session 128038807)'>
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: During handling of the above exception, another except
ion occurred:
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: Traceback (most recent call last):
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/sbin/kojira", line 1028, in main
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:     repomgr.updateRepos()
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/sbin/kojira", line 722, in updateRepos
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:     self.readCurrentRepos()
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/sbin/kojira", line 386, in readCurrentRep
os
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:     if not getTag(self.session, repo.tag_id) and not r
epo.expired():
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/sbin/kojira", line 55, in getTag
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:     info = session.getTag(tag, event=event)
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/lib/python3.9/site-packages/koji/__init__
.py", line 2346, in __call__
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:     return self.__func(self.__name, args, opts)
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:   File "/usr/lib/python3.9/site-packages/koji/__init__
.py", line 2773, in _callMethod
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]:     raise err
Feb 22 18:11:43 koji02.iad2.fedoraproject.org kojira[779312]: koji.SequenceError: 1930653 > 1930652 (session 1280388
07)

but perhaps we should close this now and file those as sperate issues? Happy to do whatever.

Login to comment on this ticket.

Metadata