#220 mini mass-rebuilds can kill CI
Opened 4 years ago by msrb. Modified 4 years ago

Yesterday there was a mini mass-rebuild in Rawhide. ~500 packages were rebuild in just couple of hours, which generated 2000+ tasks in CI (rpminspect+rpmdeplint+installability+dist-git, and some compose-ci and eln tasks * 500).

This is just way too much for such a short period of time. As expected, all 100 executors submitted requests to Testing Farm and blocked. But because the load was too much, the times for delivering the test results went up dramatically (hours). Other tasks stayed queued in Jenkins and pilled up.

The long running tasks in combination with the slow storage killed Jenkins (known issue with logging output -- longer the task is running, more output it produces).

There are two problems here:

I think for the QoS part we should prioritize what will be tested and when, based on the number of builds that a particular user submitted in the last hour or so. Requests from users who only need a few builds tested would be placed at the beginning of the queue, and requests from users who do their own mini mass-rebuilds would go at the end of the queue.


Metadata Update from @msrb:
- Issue tagged with: UX, discussion, feature, jenkins

4 years ago

@msrb do you want to handle the priority also on our side later on?

I would rather not care about it from our perspective, but it is possible to discuss.

@msrb @mvadkert is this something we could discuss tomorrow during the CI SIG meeting?

@jimbair Wednesdays are the worst days for me, basically I have all day full of meeting and will be not able to join sorry :( From Testing Farm side, for such events we would need to scale up a bit, what we cannot do now based on the load. The main problem are rpminspect tests, which currently cause quite some load on the workers, and hitting that many builds just causes big delays. Also the current Jenkins setup on Fedora CI side is not able to handle this kind of load, moreover when things get longer on our side.

But feel free to discuss with me and we can discuss afterwards what steps to take.

Log in to comment on this ticket.

Metadata