Since an ~half an hour ago I basically can't load anything from the CentOS CI OCP cluster (https://console-openshift-console.apps.ocp.ci.centos.org/), as it takes a ridiculous amount of time. mtr shows that there's something wrong with the network near the cluster:
mtr
From localhost:
... 8. viawest-svc073699-ic361683.ip.twelve99-cust.net 0.0% 525 99.6 99.8 99.5 110.7 0.6 9. be23.bbrt01.iad01.flexential.net 0.0% 525 107.7 107.8 107.5 110.2 0.2 10. be185.bbrt02.ral01.flexential.net 0.0% 525 108.0 108.1 107.7 119.9 0.5 11. be32.crrt02.ral01.flexential.net 0.0% 525 110.0 110.1 109.9 117.3 0.5 12. 128.136.224.140 0.2% 525 108.2 112.1 107.9 148.2 8.2 13. 8.43.84.1 0.0% 525 133.7 152.2 122.6 320.6 27.8 14. 8.43.84.3 8.0% 525 110.7 112.1 108.2 170.4 8.4 15. 8.43.84.4 0.0% 525 127.5 133.4 112.5 293.7 18.1 16. 8.43.84.254 3.4% 525 737.2 589.3 110.6 2369. 548.0 17. 8.43.84.248 73.5% 525 110.1 115.3 110.0 152.0 10.0
From one of the test machines provided by Fedora:
... 29. be-133-pe01.seattle.wa.ibone.comcast.net 0.0% 163 9.7 9.8 9.5 10.9 0.2 30. 23.30.206.10 0.0% 163 9.7 9.7 9.4 10.5 0.2 31. be21.bbrt01.sea01.flexential.net 0.0% 163 81.4 81.5 81.2 83.0 0.2 32. be101.bbrt01.pdx01.flexential.net 0.0% 163 81.7 81.6 81.3 82.0 0.2 33. be10.bbrt02.pdx01.flexential.net 0.0% 163 81.4 81.2 80.9 81.7 0.1 34. be198.bbrt02.msp01.flexential.net 0.0% 163 81.5 81.5 81.2 81.9 0.1 35. be173.bbrt02.msp10.flexential.net 0.0% 163 81.2 81.3 81.0 81.7 0.1 36. be10.bbrt01.msp10.flexential.net 0.0% 163 81.3 81.5 81.3 82.6 0.2 37. be188.bbrt02.chi01.flexential.net 0.0% 163 81.3 81.5 81.2 81.9 0.1 38. be10.bbrt01.chi10.flexential.net 0.0% 163 81.6 81.5 81.3 81.9 0.1 39. be204.bbrt02.cin01.flexential.net 0.0% 163 81.8 81.5 81.2 81.9 0.1 40. be192.bbrt02.iad01.flexential.net 0.0% 162 81.4 81.4 81.1 82.1 0.2 41. be10.bbrt01.iad01.flexential.net 0.0% 162 81.7 81.4 81.3 81.9 0.1 42. be185.bbrt02.ral01.flexential.net 0.0% 162 81.4 81.4 81.2 82.3 0.2 43. be32.crrt02.ral01.flexential.net 0.0% 162 81.5 81.6 81.4 83.6 0.2 44. 128.136.224.140 0.0% 162 83.0 87.9 81.5 130.6 10.9 45. 8.43.84.1 0.0% 162 103.4 121.8 96.1 223.7 16.9 46. 8.43.84.3 20.4% 162 81.8 88.9 81.7 121.4 11.2 47. 8.43.84.4 0.0% 162 95.0 107.6 86.1 212.6 20.1 48. 8.43.84.254 1.9% 162 84.0 603.3 82.4 1942. 528.3 49. 8.43.84.248 77.0% 162 82.1 88.0 82.0 117.4 10.1
Metadata Update from @arrfab: - Issue assigned to arrfab
Metadata Update from @arrfab: - Issue priority set to: 🔥 Urgent 🔥 (was: Needs Review) - Issue tagged with: centos-ci-infra, high-gain, medium-trouble
Just commenting on this today (saturday) but issue was directly resolved friday evening. It's all fixed, and was confirmed on #centos-ci channel on libera.chat .
Explanations : a switch port reconfiguration happened to move part of infra in CI but it seems it confused upstream switch for some seconds. While that part was fixed directly (basically two interfaces for a host acting as hypervisor were in the same bridge+vlan), it confused other parts of the network. It was all ok but for the two gateways, using keepalived and so suddenly in "split brain" mode, so just a reboot of these nodes (in serial) solved the issue
Metadata Update from @arrfab: - Issue close_status updated to: Fixed with Explanation - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.