| |
@@ -0,0 +1,88 @@
|
| |
+
|
| |
+ = Replacing a Failed Hard Drive
|
| |
+ :page-description: Steps for replacing a failed drive on a Fedora infrastructure server.
|
| |
+ :page-aliases: replacing-failed-drive.adoc
|
| |
+
|
| |
+ == Overview
|
| |
+
|
| |
+ This document provides a step-by-step procedure for replacing a failed hard drive on a Fedora infrastructure server. It includes access requirements, necessary tools, and the process for initiating and completing the drive replacement.
|
| |
+
|
| |
+ == Contact Information
|
| |
+
|
| |
+ Owner::
|
| |
+ Fedora Infrastructure Team
|
| |
+ Contact::
|
| |
+ #fedora-admin, sysadmin-main
|
| |
+ Purpose::
|
| |
+ Provide basic orientation and introduction to the sysadmin group
|
| |
+
|
| |
+ == Access Level
|
| |
+
|
| |
+ To perform this procedure, you may need to have sysadmin-main access. In the future, access details might be shared with a dedicated assignee or stored in a smaller vault. Currently, reach out to the sysadmin-main team for necessary information exchange.
|
| |
+
|
| |
+ == Requirements
|
| |
+
|
| |
+ * Red Hat VPN Access - Needed for SSH access to the machine.
|
| |
+ * Bitwarden Vault Access - Access to the vault is under discussion. For now, consult the sysadmin-main team for the login credentials.
|
| |
+
|
| |
+ == Process
|
| |
+
|
| |
+ .Firstly, access the management console:
|
| |
+ . Ensure you are connected to the official Red Hat VPN.
|
| |
+ . Identify the server in question. For this SOP, we will use `bvmhost-x86-01.stg.iad2.fedoraproject.org` as an example.
|
| |
+ . To access the management console, append `.mgmt` to the hostname: `bvmhost-x86-01-stg.mgmt.iad2.fedoraproject.org`.
|
| |
+ . Obtain the IP address by pinging the server from `batcave01`:
|
| |
+ +
|
| |
+ [source,bash]
|
| |
+ ----
|
| |
+ ssh batcave01.iad2.fedoraproject.org
|
| |
+ ping bvmhost-x86-01-stg.mgmt.iad2.fedoraproject.org
|
| |
+ ----
|
| |
+
|
| |
+ . Visit the IP address in a web browser. The management console uses HTTPS, so accept the self-signed certificate:
|
| |
+ +
|
| |
+ [source]
|
| |
+ ----
|
| |
+ https://<IP_ADDRESS>
|
| |
+ ----
|
| |
+
|
| |
+ . Login using the credentials found in the `admin-stg` entry in Bitwarden.
|
| |
+
|
| |
+ .Navigate to the overview page to find the serial number/service tag of the machine.
|
| |
+
|
| |
+ === Identify the Failed Drive
|
| |
+
|
| |
+ . Navigate to the storage menu to identify the failed drive. Warnings about failing/failed drives will be indicated here.
|
| |
+ . Note the failed drive's details (e.g., drive 4).
|
| |
+ . Create a failed drice report by clicking on the exporting the information of failed drive.
|
| |
+
|
| |
+ === Create a Support Ticket
|
| |
+
|
| |
+ . In the management console, click on the support link in the top right corner.
|
| |
+ . Follow these steps to contact technical support:
|
| |
+ .. Go to the top left search bar and select "Support > Contact Technical Support".
|
| |
+ .. Search for the device using the service tag from the overview page.
|
| |
+ .. Select "HardDrive and RAID Controller" from the drop-down menu.
|
| |
+ .. Choose one of the support options:
|
| |
+ ... Call: 24/7
|
| |
+ ... Live Chat: 7 am - 9 pm CDT, Monday - Friday
|
| |
+ ... Social Connect
|
| |
+
|
| |
+ . In the live chat support, provide the failed drive report, once they verify and confirm the failure issue, they will send an email regarding replacement details.
|
| |
+ . If live chat is unsuccessful, call support at 1-866-362-5350 (available 24/7).
|
| |
+
|
| |
+ === Follow-Up with the Support Ticket
|
| |
+
|
| |
+ . Once the support ticket is created, the assignee will receive a form via email.
|
| |
+ . Forward this form to Patrick Cole (pcole@redhat.com) along with the machine's serial number and location.
|
| |
+ +
|
| |
+ [NOTE]
|
| |
+ ====
|
| |
+ At this point, Patrick Cole will handle the coordination with Dell for the drive replacement. This avoids adding unnecessary intermediaries.
|
| |
+ ====
|
| |
+
|
| |
+ Patrick will then coordinate the replacement with Dell, including arranging access for the technician if needed.
|
| |
+
|
| |
+ == Conclusion
|
| |
+
|
| |
+ Following this SOP ensures a systematic approach to replacing failed drives, minimizing downtime and maintaining system integrity. Always reach out to the sysadmin-main team for any clarifications or additional support.
|
| |
\ No newline at end of file
|
| |
Infra Ticket: https://pagure.io/fedora-infrastructure/issue/11947
Signed-off-by: Samyak Jain samyak.jn11@gmail.com