#11934 Replace Failed Drive on `bvmhost-x86-01.stg.iad2.fedoraproject.org`
Closed: Fixed 22 days ago by dkirwan. Opened a month ago by jnsamyak.

Problem Statement:


The server bvmhost-x86-01.stg.iad2.fedoraproject.org has a failed drive that needs to be replaced. The process involves contacting Dell support, providing a hardware report, and coordinating with IT for drive replacement.

Steps to Resolve:

  1. Contact Dell Support:
    - Call Dell support to report the failed drive.
    - Provide the necessary hardware report exported from the server.

  2. Generate the Hardware Report:
    - Access the server’s management interface at bvmhost-x86-01-stg.mgmt.iad2.fedoraproject.org (IP: 10.3.160.191).
    - Log in using the <CHECK_EMAIL> password.
    - Retrieve the serial number and confirm the drive failure.
    - Generate and send the hardware report to Dell.

  3. Notify IT:
    - Inform the IT team that a new drive will be shipped.
    - Ensure IT is prepared to accept the delivery and replace the drive.
    - Notify pcole@redhat.com about the incoming drive for the IAD2 datacenter.

Additional Notes:

  • The old Dell support report can be ignored.
  • The machine should be under warranty.

Metadata Update from @phsmoura:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain, medium-trouble, ops

a month ago

Update: This is in progress, we were able to reach DELL and open a support ticket, they confirmed it was indeed a failed drive, and sent up a form for the replacement! This is done via cross-communication with the internal redhat team for shipping purposes, the form has been filled and sent back to DELL.

Engineer scheduled to visit the datacenter today Wed 29th May 8am - 6pm.

Will update once this work is completed.

Metadata Update from @dkirwan:
- Issue assigned to jnsamyak

22 days ago

HD has been replaced successfully.

Metadata Update from @dkirwan:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

22 days ago

The final step is re-adding the drive to the raid:

So, first go there and look at dmesg to find the new drive.
[Wed May 29 17:57:42 2024] sd 0:2:4:0: [sdj] 1170997248 512-byte logical blocks: (600 GB/558 GiB)
it's sdj
next look at another drive, say sdi... 'fdisk -l /dev/sdi'
now copy that exact partition setup to sdj
(there's a parted way to just copy it, but I never remember it)
Now, just re-add it to all the raid's:
mdadm /dev/md0 --add /dev/sdj1
(and md1/md2 and sdj2/3)
and it's rebuilding away: [>....................] recovery = 0.4% (2812416/583852032) finish=72.3min speed=133924K/sec

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog