#1386 scale task_avail_delay based on bin rank
Merged 3 years ago by mikem. Opened 3 years ago by mikem.
mikem/koji more-is-better  into  master

No commits found

Currently task allocation in Koji is decentralized. The builders pick their next task from a list. The system prefers builders with higher available capacity via the algorithm that the builders use. For a given task, they look at the set of other ready builders for the given channel-arch bin. If the host is below the median, it will not take that task until a waiting period (task_avail_delay) has passed. This delay gives higher capacity hosts more of a chance to claim the task.

Unfortunately, if the set of hosts is very heterogeneous in capacity, the largest capacity hosts might not get used as much as they should because this algorithm does not distinguish any more finely than above/below the median.

This change generalizes the task_avail_delay behavior to scale with the rank of the host within the channel-arch bin. The hosts with highest capacity will take the task immediately, while hosts lower down will have a delay proportional to their rank. We calculate rank as a float between 0.0 and 1.0 and use that as a multiplier for the delay.

The end result will be that hosts with higher available capacity will be more likely to claim a task, resulting in better utilization of the highest capacity hosts.

Longer term, we're planning a complete scheduling overhaul. This is just a small fix for a problem we're seeing in our instance.

Metadata Update from @mikem:
- Pull-request tagged with: testing-ready

3 years ago

Hmm, no CI tests ran for this. Any idea why @tkopecek ?

It looks like I need to adjust a unit test here.

2 new commits added

  • fix unit test
  • tweak rank calculation
3 years ago

pretty please pagure-ci rebuild

3 years ago

CI sometimes (I suspect abput 5% cases) simply doesn't trigger. Anyway, "Rerun CI" button is now in pagure, which I uses in those cases.

Commit 40040ca fixes this pull-request

Pull-Request has been merged by mikem

3 years ago