$ oc logs -f bodhi-web-4-nhgnw Error from server: Get https://os-node02.phx2.fedoraproject.org:10250/containerLogs/bodhi/bodhi-web-4-nhgnw/bodhi-web?follow=true: net/http: TLS handshake timeout
After a minute or two of that, the container status for those pods started to show "Unknown" instead of "Running", though still showed 0 restarts. After a while, OpenShift started up two more bodhi-web-4 pods (with new hashes) and the "Unknown" ones disappeared. Bodhi now seems to be served, but one of the new pods has been showing as "ContainerCreating" for 11 minutes now:
$ oc get pods NAME READY STATUS RESTARTS AGE bodhi-web-3-build 0/1 Completed 0 21h bodhi-web-4-q6dcv 1/1 Running 0 12m bodhi-web-4-rwmsv 0/1 ContainerCreating 0 11m
Normally Bodhi would be served with two pods. I don't know how to see logs for what is happening right now, but the other pod does seem to be getting by for now.
Are there more logs available than I have access to? If so, do they reveal anything useful about what might have happened here?
When do you need this? (YYYY/MM/DD) N/A
When is this no longer needed or useful? (YYYY/MM/DD) If it gets back to two pods somehow, or if you don't want to investigate what might have happened.
If we cannot complete your request, what is the impact? We may not learn what happened, and Bodhi may continue to only be served by one pod.
I swear - immediately when this page loaded after I submitted this ticket, the container showed as "Running". Weird timing.
Anyways, I'll leave this ticket open in case anyone wants to investigate the issue further, otherwise feel free to close it.
I just noticed that I had forgotten to adjust the hostname when I ran oc login, and I did run that on batcave01. I think this explains the oc logs issue I was having.
oc login
batcave01
I now know why it took so long for the container to come up - it took 14 minutes to pull from the registry (so just slow performance):
$ oc describe pod bodhi-web-4-rwmsv <snip> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 29m 29m 1 default-scheduler Normal Scheduled Successfully assigned bodhi-web-4-rwmsv to os-node01.phx2.fedoraproject.org 28m 28m 1 kubelet, os-node01.phx2.fedoraproject.org spec.containers{bodhi-web} Normal Pulling pulling image "docker-registry.default.svc:5000/bodhi/bodhi-web@sha256:8386c2b654561b984938373934908c31ed3ce74b880820a2988492881f62799b" 14m 14m 1 kubelet, os-node01.phx2.fedoraproject.org spec.containers{bodhi-web} Normal Pulled Successfully pulled image "docker-registry.default.svc:5000/bodhi/bodhi-web@sha256:8386c2b654561b984938373934908c31ed3ce74b880820a2988492881f62799b"
So the only remaining mystery is what happened to the old pods.
Yeah, I see where they were dropped, it looks like some kind of network hiccup...
Aug 28 14:42:42 os-node02 atomic-openshift-node: W0828 14:42:42.469025 99291 prober.go:103] No ref for container "cri-o://0bb8d8a410d862b2563f918ada95fe92958fb87affb673ec3abd3180ca12d428" (bodhi-web-4-nhgnw_bodhi(3891c932-aa1f-11e8-83f4-52540068650e):bodhi-web)
Not sure we are going to find out much more...
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.