#2885 tasks/builds are not killed after reaching a timeout
Opened 2 years ago by sharkcz. Modified 4 months ago

We are observing a situation in Fedora koji instance when tasks are not killed after reaching the defined time-out, but they are being restarted indefinitely. As a consequence they consume the precious resources (eg. s390x).

https://koji.fedoraproject.org/koji/taskinfo?taskID=68274948 is a recent example (I have manually cancelled it), which was in progress for more than 1 week, see the restarted individual buildArch tasks

cc @kevin , @mohanboddu


I'm not sure if koji is to blame here, I think it just sets the timeout in mock? But I could be misremembering...

In any case I agree it would be good to get this working again.

Metadata Update from @kevin:
- Custom field Size adjusted to None

2 years ago

Timeout and restart happens in different cases. Restart means, that something bigger is going on. Build hadn't reached set timeout, but faild in som non-standard way (something what was not noticed by rpmbuild/mock) - like failing builder, kernel issues, etc. Are we able to get kojid/journal logs from around the time of restart? Anything suspicious there?

So, this one is various things:

I was reinstalling builders yesterday, so I freed it several times to move it off a builder I was re-installing.

I can't tell much on the later ones. No OOms on that builder, no kernel issues that I can see. No ideication of why it restarted. ;(

I was briefly watching this one and when the gcc test-suite was run, it got restarted around noon CET. And seems it got restarted a short while ago too ...

a new candidate to investigate = https://koji.fedoraproject.org/koji/taskinfo?taskID=69702088
(and there are more right now)

This is a list of tasks not progressing I have cancelled today, the oldest were "running" for ~2 weeks
69701974 70093944 70106858 70228793 70247868 70347362 70473614 70489935 70573749 70524504 70683005 70706039 70612859 70615113 70618949

Login to comment on this ticket.

Metadata