Learn more about these different git repos.
Other Git URLs
Koji expects all the hosts in the channel are able to run the task when median capacity is calculated in getNextTask - this might not be necessarily true.
getNextTask
OSBS team has hit this, when a host (containerbuild), which has a plugin for buildContainer task, was moved to a testing channel, where other hosts were added. containerbuild host had capacity set to 24, buildContainer task weight is 2.0, max-jobs on builder was reset to 13.
containerbuild
buildContainer
max-jobs
Nevertheless koji scheduled 10 builds out of 13 in queue, because median capacity algorithm was expecting other hosts to take the remaining three jobs. They couldn't, as none of them had necessary plugin installed.
getNextTask should be able to filter out hosts, which can't run the task, when median capacity is calculated
Without a significant refactor, there is no reasonable way for one host to know that another host cannot take a particular task for reasons other than arch or channel.
What we should do in the short term is implement a short delay when the host is in the bottom half, instead of constantly rejecting it. This would still give hosts in the upper half a chance, but keep situations like this from stalling out forever.
Actually, looks like I have some partial work on this from a few months back
Metadata Update from @mikem: - Issue assigned to mikem
Here is the partial work that was mentioned: https://github.com/mikem23/koji-playground/commits/bin-lower-half
Still needs unit tests
Metadata Update from @mikem: - Issue set to the milestone: 1.17
I've cleaned up the work above and submitted as PR #1176
Commit d424fb0 relates to this ticket
The short term fix from #1176 will be in 1.17. A proper fix will likely have to wait for 1.18 or later.
Metadata Update from @mikem: - Issue set to the milestone: 1.18 (was: 1.17)
Metadata Update from @dgregor: - Custom field Size adjusted to None - Issue set to the milestone: None (was: 1.18)
Removing from 1.18 as the long-term work will get rolled up into some upcoming schedule refactoring.
Metadata Update from @tkopecek: - Issue tagged with: scheduler
Login to comment on this ticket.