#1897 koji.fedoraproject.org builders unable to retry or taking tasks and never checking in again
Closed: Invalid 4 years ago by kevin. Opened 4 years ago by kevin.

We have been hitting issues for the last week or two with koji.fedoraproject.org.

See: https://pagure.io/fedora-infrastructure/issue/8477

The symptoms are:

  • Builds sometimes error with: RetryError: unable to retry call 11 (method host.initBuild) for session 92593309
  • Builds sometimes error with: GenericError: Build already in progress
  • Sometimes builders log Attempting to take task xxxxxxx and then stop checking in entirely.
  • Signing bridge often hangs after signing things and waiting for koji to ack the signatures.
  • Sometimes things like tagBuild will get taken by a builder, but then just hang.
    (strace on the kojid process just shows: restart_syscall(<... resuming interrupted restart_syscall ...>)

This could well be a problem in layers above koji, but I have not been able to isolate it there.
Ideas on where or what could be causing this welcome from koji developers. :(


buildhw-04:

2020-01-01 19:47:10,028 [INFO] koji.TaskManager: Attempting to take task 4003150

stops checking in:

Wed Jan  1 19:57:02 UTC 2020
buildhw-04.phx2.fedoraproject.org        Y   Y    0.0/4.0  i386,x86_64      2020-01-01 19:47:10

hubs:

koji01:
[Wed Jan 01 19:47:10.055234 2020] [:error] [pid 6217] 2020-01-01 19:47:10,055 [INFO] m=host.openTask u=buildh
w-04.phx2.fedoraproject.org p=6217 r=10.5.126.9:45084 koji.xmlrpc: Handling method host.openTask for session 
92641090 (#1908)'
[Wed Jan 01 19:47:10.068121 2020] [:error] [pid 6217] 2020-01-01 19:47:10,068 [INFO] m=host.openTask u=buildh
w-04.phx2.fedoraproject.org p=6217 r=10.5.126.9:45084 koji.xmlrpc: Completed method host.openTask for session
 92641090 (#1908): 0.012716 seconds, rss 36192, stime 0.493776'

koji02:

[Wed Jan 01 19:47:10.039216 2020] [:error] [pid 14739] 2020-01-01 19:47:10,039 [INFO] m=getTaskInfo u=buildhw
-04.phx2.fedoraproject.org p=14739 r=10.5.126.9:51886 koji.xmlrpc: Handling method getTaskInfo for session 92
641090 (#1907)'
[Wed Jan 01 19:47:10.041166 2020] [:error] [pid 14739] 2020-01-01 19:47:10,041 [INFO] m=getTaskInfo u=buildhw
-04.phx2.fedoraproject.org p=14739 r=10.5.126.9:51886 koji.xmlrpc: Completed method getTaskInfo for session 9
2641090 (#1907): 0.001756 seconds, rss 37340, stime 1.248160'

Metadata Update from @kevin:
- Custom field Size adjusted to None

4 years ago

Turns out this was very likely due to a plugin, the fedora-messaging one to be exact. ;(

Sorry for the false alarm.

Metadata Update from @kevin:
- Issue close_status updated to: Invalid
- Issue status updated to: Closed (was: Open)

4 years ago

Login to comment on this ticket.

Metadata