The CoreOS CI pod currently went down because of:
Warning FailedScheduling 117m default-scheduler 0/11 nodes are available: 1 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 7 node(s) had volume node affinity conflict.
I.e. i think it's saying that the node of the local PVC we're using (jenkins-local) is unschedulable. That PVC IIRC is associated with kempty-n12. Can someone take a look at that node to see what's wrong with it?
This is fixed now. For reference, @arrfab said:
10:26:03 < arrfab> jlebon: just called our adhoc-reset-ipmi.yml ansible ad-hoc task :) 10:26:09 < arrfab> to reset the node 10:26:35 < arrfab> but if that's happening on the same node again and again that can be sign of bad underlying ssd disk on that node
Agree we should dig into that node since it's not the first time it's giving us trouble (https://pagure.io/centos-infra/issue/423).
Metadata Update from @jlebon: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.