I'm trying to trigger new container builds of the openshift-apps/coreos-cincinnati role, but they reliably fail to complete on prod.
openshift-apps/coreos-cincinnati
Both the playbook and related variables are (currently) the same on prod and stg. On stg, it correctly built on the first try. On prod, builds are scheduled and start executing, but they stop in the middle of the run, right after the first dnf install completes.
dnf install
I've currently observed 5 (out of 5) hangs and eventual timeouts. All hanging builds have been assigned to os-node02.phx2.fedoraproject.org by the scheduler.
os-node02.phx2.fedoraproject.org
Here below are the logs for a good build (stg) and for an hanging one (prod): * stg - https://os.stg.fedoraproject.org/console/project/coreos-cincinnati/browse/builds/coreos-cincinnati-stub/coreos-cincinnati-stub-96?tab=logs * prod - https://os.fedoraproject.org/console/project/coreos-cincinnati/browse/builds/coreos-cincinnati-stub/coreos-cincinnati-stub-6?tab=details
Metadata Update from @smooge: - Issue priority set to: None (was: Needs Review) - Issue tagged with: OpenShift
I was able to get a build to go through: https://os.fedoraproject.org/console/project/coreos-cincinnati/browse/builds/coreos-cincinnati-stub/coreos-cincinnati-stub-9
I restarted docker on the openshift nodes.
I don't know why this happens, but after a few weeks of uptime, docker just goes into lala land and stops processing builds. ;(
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
@kevin thanks for gently slapping docker with a trout, then ;)
Login to comment on this ticket.