As mentioned in D604, if buildbot master decides to kill a task (due to client "inactivity", i.e. not producing any log output for X minutes), it will kill that process, but our disposable minions will stay up, suddenly "unowned". They will stay around and consume resources forever, until the host machine resources are depleted and our tasks start crashing with out of memory errors.
Let's try to find a good solution for this. Some ideas: 1. If buildbot master does not kill the process right away but tries to terminate it first, we could intercept this signal and perform a disposable minion teardown. 2. If buildbot master performs some additional steps even when task is killed, one of those steps could include tearing down disposable minion relevant for that buildbot client. But how to find out which VMs are related? 3. We could try to tear down any existing disposable minions during //next// task execution, before spawning a new disposable minion. Again, we need somehow discovered which minion (if any) is relevant to know what to tear down. Can changes from this ticket be used for this (static VM names per username)?
This ticket had assigned some Differential requests: D604
Metadata Update from @kparal: - Issue tagged with: infrastructure
This is now fixed, we remove the minion in a buildbot step after task execution is complete.
Metadata Update from @kparal: - Issue close_status updated to: Fixed
Login to comment on this ticket.