#1413 Frozen succeeded builds after backend error
Closed: Fixed 3 years ago by msuchy. Opened 3 years ago by iucar.

There are several builds right now for iucar/cran stuck for 12+ hours with this error message in backend.log:

[2020-06-15 09:31:02,819][  INFO][PID:993367] Worker succeeded build, took 569.1663694381714
[2020-06-15 09:33:12,501][WARNING][PID:993367] Retry request #1 on update: Requests error on https://copr.fedorainfracloud.org/backend/update/: HTTPSConnectionPool(host='copr.fedorainfracloud.org', port=443): Max retries exceeded with url: /backend/update/ (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fdcca003c90>: Failed to establish a new connection: [Errno 110] Connection timed out'))
[2020-06-15 09:33:17,507][ ERROR][PID:993367] unexpected failure Attempt to talk to frontend timeouted (we gave it 2 attempts) (in /usr/lib/python3.7/site-packages/copr_backend/frontend.py:95)

Thank you for the report. I noticed this, and https://pagure.io/copr/copr/pull-request/1412 should fix the problem (already hot-fixed in production).

Metadata Update from @praiskup:
- Issue tagged with: bug

3 years ago

I mean, your build queue has now several non-background jobs (those are
prioritized over the background jobs) and many source builds (prioritized
even more).

Because of the bug fixed by #1412, workers which were handling the "stuck"
builds ended prematurely => which kept them in "running" state, even
though those are not actually running ATM (they are basically pending till
backend re-takes them again).

Backend would already process them at this time, but those are background
jobs
so your other jobs submitted later are continuously overtaking the
older. IOW, those builds will be eventually processed.

Login to comment on this ticket.

Metadata
Related Pull Requests
  • #1412 Merged 3 years ago