Issue #1373: major hardware issue impacting crucial services - centos-infra - Pagure.io

centos-infra

#1373 major hardware issue impacting crucial services

Closed: Fixed 3 months ago by arrfab. Opened 3 months ago by arrfab.

This Sunday ,we got zabbix/monitoring alerts about some services being down.
All services/Virtual Machines were running on same hypervisor.
We'll need to take contact with the Hardware vendor, as despite all storage being configured as raid 6 array on a hardware raid controller with read/write cache, it confirms (fw/bios message on reboot) some corruption and data loss.

Impacted services :

https://cbs.centos.org (kojihub)
https://git.centos.org (pagure)

arrfab commented 3 months ago

mail thread : https://lists.centos.org/pipermail/centos-devel/2024-March/165559.html

arrfab commented 3 months ago

Normally all restored now but keeping ticket open in case someone would find related issue after we restored the services

arrfab commented 3 months ago

INC2889880 (for hardware issue follow-up)

arrfab commented 3 months ago

status update : still waiting for hardware vendor to fix the issue

arrfab commented 3 months ago

hardware controller replaced and server is now being reprovisionned and added back in ansible

Metadata Update from @arrfab:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 months ago

Log in to comment on this ticket.

Metadata

Assignee

Tags

Blocking

None

Depending on

None

Priority

🔥 Urgent 🔥

Powered by Pagure 5.14.1

Documentation • File an Issue • About • SSH Hostkey/Fingerprint

© Red Hat, Inc. and others.