#2083 Another indefinitely running job
Closed: MIGRATED 2 years ago by nikromen. Opened 2 years ago by praiskup.

Seems like the machine died, but for some reason the worker process still runs..?

https://copr.fedorainfracloud.org/coprs/networkmanager/NetworkManager-main-debug/build/3292780/

One of the reason why this happens is:
https://github.com/praiskup/resalloc/blob/380a68d02e12368983c55639209ab848dcc202b5/resallocserver/logic.py#L137

But we need to debug what happened on backend side; afaik, the ssh connection should be terminated ... and even the timeout already took a place (INT sent):

PASS: src/core/platform/tests/test-tc-linux 4 /link/qdisc/tbf
 !! Copr timeout => sending INT

8 days

Can anyone take a look, please? Please don't cleanup the worker till we 100% know
how to reproduce this.

I assume
- the machine is still running, therefore the old ssh connection works
- the build is in some deadlock state
- the new connection can not be started (perhaps because of ENOMEM/ENOSPC)?

WDYT?

Metadata Update from @praiskup:
- Issue assigned to praiskup

2 years ago

Metadata Update from @nikromen:
- Issue close_status updated to: MIGRATED
- Issue status updated to: Closed (was: Open)

2 years ago

Log in to comment on this ticket.

Metadata