#1707 [RFE] Cleaning of old records in database
Closed: Fixed 4 years ago by tkopecek. Opened 4 years ago by tkopecek.

koji is designed to never delete anything from its db (files are different case). Nevertheless, there are some classes of data which doesn't have much value. E.g. old newRepo tasks don't have much value and takes a lot of space in production dbs (59% of tasks in prod is of this 'maintenance' type (newRepo + createrepo + tagNotification). Task table is usually second biggest after buildroot_listing.

Cleaning old buildroots under some circumstances (e.g. never used for real build, but only old scratches) could be an option, but I'm more hesitating here.

It could make sense to be able to wipe out these data selectively. Of course, there should be some policy-driven rules about that.

Would find it useful, or is it too much against audit requirements?


I don't know that the actual size (i.e. disk space) of the DB has been raised as a concern for Koji. Would cleaning of these old records improve the speed of certain operations?

Metadata Update from @dgregor:
- Custom field Size adjusted to None

4 years ago

It is hard to make an estimation here, but it should involve some speedup due to smaller indices. For tasks typically listing with filters is very slow (e.g. https://koji.fedoraproject.org/koji/tasks?state=failed&view=tree&method=all&order=-id) - it could be, of course, solved on other levels, but yes - even smaller db would help.

See my suggestion about partitioning here: https://pagure.io/fedora-infrastructure/issue/8292#comment-605825

I am pretty sure this is the cause of us not being able to make full database dumps without impacting performance, so we would really like a solution. ;(

If monthly partitioning was setup, possibly also the interface could assume 'last month only' and add a field to let you specify 'all' or some partition. That would make that a lot faster.

Metadata Update from @dgregor:
- Issue assigned to tkopecek
- Issue priority set to: High (was: Normal)
- Issue set to the milestone: 1.20

4 years ago

@tkopecek please look into this a bit more. In particular, table partitioning (https://severalnines.com/database-blog/guide-partitioning-data-postgresql) may address the speed issues while avoiding potential issues with removing data.

I've created few cleanup actions in koji-sweep-db. Let's discuss it in #1824

Metadata Update from @jcupova:
- Issue tagged with: testing-ready

4 years ago

Metadata Update from @jcupova:
- Issue tagged with: testing-done

4 years ago

Login to comment on this ticket.

Metadata
Related Pull Requests
  • #1824 Merged 4 years ago