#1312 Refactor how the concurrent build threshold works
Opened 4 years ago by mprahl. Modified 4 years ago

MBS can be configured with NUM_CONCURRENT_BUILDS which sets a cap on the number of component builds that can be going on in parallel. There are a few flaws in its implementation which make it unfair to MBS users.

When MBS starts a new batch (i.e. buildorder), it will submit as many component builds in the batch as it can until the total number of components builds in MBS reaches NUM_CONCURRENT_BUILDS.

When a component build of any module finishes and there are still unbuilt components in that module's current batch, MBS will call continue_batch_build. This will try to submit as many component builds in the batch as it can until the total number of components builds in MBS reaches NUM_CONCURRENT_BUILDS. This means it comes to luck and good timing as to whether or not your component build finishes and there are available slots.

When a component build of any module finishes and the module's current batch is complete, there is nothing that immediately resumes module builds that have been waiting for a freed up slot. The modules have to wait for the poller to wake up, which happens every 10 minutes. The poller will then find any "paused" modules, which are modules that are in the build state but have no components building and aren't waiting for a repo regeneration. MBS then processes these modules one by one, sorted by the lowest ID. MBS will call continue_batch_build on each module. This will try to submit as many component builds in the batch as it can until the total number of components builds in MBS reaches NUM_CONCURRENT_BUILDS. So what this means is that MBS will favor modules submitted earlier, and the other module builds will be left to starve.

Another issue is that MBS will still submit module-build-macros builds when concurrent component threshold is reached, but the builds still count towards the allowed number of builds.

MBS should either remove this limit, or refactor its algorithm to be more fair.


+1 for removing this algorithm altogether. Koji already has mechanisms for throttling tasks, let's just rely on those. It might be worth creating a ticket in fedora-infrastructure requesting this limit to be removed. Internally, we're working towards the same.

Login to comment on this ticket.

Metadata