In the staging openshift (os.stg.fedoraproject.org), in the "docsbuilding" project, I have a stuck build (#21) that's running for 9 hours now.
https://os.stg.fedoraproject.org/console/project/docsbuilding/browse/builds/builder-build/builder-build-21?tab=details
Can someone please kill it?
(I have already resubmitted the build, but the new one is waiting for the old, stuck one, to finish.)
Thanks!
Canceled (unfortunately I canceled yours as well... so I started another new build)
FYI, this is using the fedora 29 container, we may want to move it to 30 or 31 soon. :)
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
So, unfortunately, the new one is stuck again. :-(
I don't believe there's an error in the Dockerfile — it's the very same one that was succeeding before, and it builds fine locally. Can that be something with OpenShift?
PS: Thanks for noticing it's an f29, I'll fix that when I'm back from vacation.
Metadata Update from @asamalik: - Issue status updated to: Open (was: Closed)
We have a Bodhi build stuck too, I have cancelled it and re triggered the build couple times but it is always stuck :-(.
I could not see anything useful in the Events or logs.
IIUC we should be able to cancel our own builds now. Read https://pagure.io/fedora-infrastructure/issue/8005#comment-602921 and the following comments.
I fear this may be a cluster-wide issue. On the staging cluster, I tried five times building the same coreos-cincinnati config and 100% of them got stuck.
coreos-cincinnati
The Dockerfile is unchanged and looks like this:
FROM fedora:30 RUN dnf -y install g++ openssl-devel RUN dnf -y install rust cargo ...
The build job never reaches the second RUN and just hangs after the first dnf is completed.
RUN
dnf
I fear this may be a cluster-wide issue. On the staging cluster, I tried five times building the same coreos-cincinnati config and 100% of them got stuck. The Dockerfile is unchanged and looks like this: FROM fedora:30 RUN dnf -y install g++ openssl-devel RUN dnf -y install rust cargo ... The build job never reaches the second RUN and just hangs after the the first dnf is completed.
I fear this may be a cluster-wide issue. On the staging cluster, I tried five times building the same coreos-cincinnati config and 100% of them got stuck. The Dockerfile is unchanged and looks like this: FROM fedora:30 RUN dnf -y install g++ openssl-devel RUN dnf -y install rust cargo ...
The build job never reaches the second RUN and just hangs after the the first dnf is completed.
Yes same here for the bodhi project. I ll try to check the cluster and see if this is a resource problem
I have restarted the docker service on os-node04.stg.phx2.fedoraproject.org and it seems to have fixed the problem. At least for bodhi.
os-node04.stg.phx2.fedoraproject.org
@lucab and @asamalik could you try to start a new build ?
@cverna confirmed, I triggered another build (which ended up on node os-node04.stg.phx2) and it went fine. I can't see on which nodes the previous failing builds were scheduled, though.
os-node04.stg.phx2
I have checked all the nodes and only os-node04 had some stuck build there.
os-node04
Let's close this for now, feel free to reopen again if you experience the same issue.
Sorry for lagging with my response, I was on vacation the whole week.
Thanks for all the help! As Dusty suggests, I can now kill pods myself which is super helpful.
However, the builds are still getting stuck. I killed the first one after 15 min, and the second one now hangs for about 40 mins. I left it run in case it helps with debugging. Both on os-node01 if that helps.
os-node01
ok, looks like docker was confused. The build finished, but it never exited right.
I restarted docker on all the staging nodes and fired a new build.
It seems to have completed ok.
Side note: you are using Fedora 29 there, please update to 31 before 29 goes end of life. ;)
Login to comment on this ticket.