#507 CoreOS CI down due to unschedulable node (kempty-n12)
Closed: Fixed 2 years ago by jlebon. Opened 2 years ago by jlebon.

The CoreOS CI pod currently went down because of:

Warning  FailedScheduling  117m   default-scheduler  0/11 nodes are available: 1 node(s) were unschedulable, 3 node(s) had taint
                   {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 7 node(s) had volume node affinity conflict.

I.e. i think it's saying that the node of the local PVC we're using (jenkins-local) is unschedulable. That PVC IIRC is associated with kempty-n12. Can someone take a look at that node to see what's wrong with it?


This is fixed now. For reference, @arrfab said:

10:26:03 < arrfab> jlebon: just called our adhoc-reset-ipmi.yml ansible ad-hoc task :)
10:26:09 < arrfab> to reset the node
10:26:35 < arrfab> but if that's happening on the same node again and again that can be sign of bad underlying ssd disk on that node

Agree we should dig into that node since it's not the first time it's giving us trouble (https://pagure.io/centos-infra/issue/423).

Metadata Update from @jlebon:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata