#4587 pull_request_ready_branch task and huge number of branches
Closed: Fixed 4 years ago by pingou. Opened 4 years ago by msrb.

Pagure creates an asynchronous task (pull_request_ready_branch) every time a user clicks on "Branches" tab. The asynchronous task computes diff between all fork and upstream branches (or non-master vs master for non-fork projects) and results are then shown to the user in web UI.

This works fine, for normal repos. Here, the "normal" means repos with reasonable number of branches in it. However, we have a repo that has 7.5k branches (kernel:)). And it takes a very long time to compute the diffs for all those branches.

User experience is one thing, but there is one more problem. Since it takes such a long time to finish the asynchronous task, it will occupy resources (workers) even though the user has already moved on and clicked elsewhere.

Clicking on "Branches" tab again, and a new asynchronous task will be created (the old one is still running). This could seriously degrade performance as workers are busy computing the diffs while other asynchronous tasks are waiting in the queue.


I think one possible solution could be to only compute diff for branches that were updated recently. Something like git for-each-ref --count=10 --sort=-committerdate refs/heads/ could be used to obtain the last 10 "active" branches.

Something like git for-each-ref --count=10 --sort=-committerdate refs/heads/ could be used to obtain the last 10 "active" branches.

I've been reluctant to shell out things to git but in this case it may be a good idea and we could always just run this only when the number of branches is higher than 10 (to be consistent about the number of branches returned).

Metadata Update from @pingou:
- Issue assigned to pingou

4 years ago

Metadata Update from @pingou:
- Issue set to the milestone: 5.9

4 years ago

Login to comment on this ticket.

Metadata
Related Pull Requests
  • #4718 Merged 4 years ago
  • #4717 Merged 4 years ago