Issue #507: CoreOS CI down due to unschedulable node (kempty-n12) - centos-infra

centos-infra

#507 CoreOS CI down due to unschedulable node (kempty-n12)

Closed: Fixed 2 years ago by jlebon. Opened 2 years ago by jlebon.

The CoreOS CI pod currently went down because of:

Warning  FailedScheduling  117m   default-scheduler  0/11 nodes are available: 1 node(s) were unschedulable, 3 node(s) had taint
                   {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 7 node(s) had volume node affinity conflict.

I.e. i think it's saying that the node of the local PVC we're using (jenkins-local) is unschedulable. That PVC IIRC is associated with kempty-n12. Can someone take a look at that node to see what's wrong with it?

jlebon commented 2 years ago

This is fixed now. For reference, @arrfab said:

10:26:03 < arrfab> jlebon: just called our adhoc-reset-ipmi.yml ansible ad-hoc task :)
10:26:09 < arrfab> to reset the node
10:26:35 < arrfab> but if that's happening on the same node again and again that can be sign of bad underlying ssd disk on that node

Agree we should dig into that node since it's not the first time it's giving us trouble (https://pagure.io/centos-infra/issue/423).

Metadata Update from @jlebon:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Metadata

Assignee

None

Tags

None

Blocking

None

Depending on

None

Priority

Needs Review

centos-infra

Source Code

#507 CoreOS CI down due to unschedulable node (kempty-n12) Closed: Fixed 2 years ago by jlebon. Opened 2 years ago by jlebon.

Metadata

#507 CoreOS CI down due to unschedulable node (kempty-n12)

Closed: Fixed 2 years ago by jlebon. Opened 2 years ago by jlebon.