We got a mail notification from AWS that there was an underlying issue on one the compute/worker nodes in the new ocp cluster. We need to cordon/drain the node, evacuate pods, follow aws doc to have node stopped/restarted on other infra, and then add it back (uncordon) to ocp cluster after verification
Metadata Update from @arrfab: - Issue assigned to arrfab
Metadata Update from @arrfab: - Issue tagged with: centos-ci-infra, high-gain, medium-trouble
Node was temporary removed so all openshift pods are scheduled/migrated to remaining workers. When trying to follow as procedure, it doesn't come back due to InsufficientInstanceCapacity error. I'll have a look at that and then add node back in ocp cluster when it's possible to provision/restart it
it seems there is now enough capacity in region/availability zone to have instance back and running. Added back to openshift so ocp.cloud.ci.centos.org ocp cluster running now as normal
Metadata Update from @arrfab: - Issue close_status updated to: Fixed with Explanation - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.