#8836 prod/openshift-apps: builds start but get stuck midway
Closed: Fixed 3 years ago by kevin. Opened 3 years ago by lucab.

I'm trying to trigger new container builds of the openshift-apps/coreos-cincinnati role, but they reliably fail to complete on prod.

Both the playbook and related variables are (currently) the same on prod and stg. On stg, it correctly built on the first try. On prod, builds are scheduled and start executing, but they stop in the middle of the run, right after the first dnf install completes.

I've currently observed 5 (out of 5) hangs and eventual timeouts. All hanging builds have been assigned to os-node02.phx2.fedoraproject.org by the scheduler.

Here below are the logs for a good build (stg) and for an hanging one (prod):
* stg - https://os.stg.fedoraproject.org/console/project/coreos-cincinnati/browse/builds/coreos-cincinnati-stub/coreos-cincinnati-stub-96?tab=logs
* prod - https://os.fedoraproject.org/console/project/coreos-cincinnati/browse/builds/coreos-cincinnati-stub/coreos-cincinnati-stub-6?tab=details


Metadata Update from @smooge:
- Issue priority set to: None (was: Needs Review)
- Issue tagged with: OpenShift

3 years ago

I restarted docker on the openshift nodes.

I don't know why this happens, but after a few weeks of uptime, docker just goes into lala land and stops processing builds. ;(

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

@kevin thanks for gently slapping docker with a trout, then ;)

Login to comment on this ticket.

Metadata