#11935 Update DRAC Firmware on All Dell Servers
Closed: Fixed 2 months ago by t0xic0der. Opened 8 months ago by jnsamyak.

Problem Statement:


The DRAC firmware on all Dell servers needs to be updated to the latest version. This task does not reboot the actual server but only its management controller.

Steps to Resolve:

  1. Login to Management Interface:
    - Access the management web interface of each server.

  2. Download DRAC Firmware:
    - Retrieve the service tag for each server.
    - Visit https://dell.com/support/ and search for the DRAC firmware corresponding to each service tag.
    - Download the firmware file.

  3. Upload Firmware:
    - Upload the downloaded firmware file to each server’s management interface.

  4. Schedule Downtime in Nagios:
    - Schedule appropriate downtime to prevent alert notifications during the update.

  5. Update Firmware:
    - Initiate the firmware update, allowing the management controller to reboot and apply the update.

Long-Term Improvement (In the future we need to look at these perspectives):

  1. Centralize Firmware Files:
    - Store all required firmware files on the central server (batcave) under /srv/web/infra/fw/dell/.

  2. Automate with Ansible:
    - Explore using the Ansible collection for Dell OpenManage to automate the firmware updates and other DRAC configurations.


Metadata Update from @zlopez:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: high-gain, medium-trouble, ops

8 months ago

Here are the hostnames of all the dell servers we currently have (hopefully I didn't forgot any):

  • autosign02.iad2.fedoraproject.org
  • backup01.iad2.fedoraproject.org
  • bkernel01.iad2.fedoraproject.org
  • bkernel02.iad2.fedoraproject.org
  • buildhw-x86-01.iad2.fedoraproject.org
  • buildhw-x86-02.iad2.fedoraproject.org
  • buildhw-x86-03.iad2.fedoraproject.org
  • buildhw-x86-04.iad2.fedoraproject.org
  • buildhw-x86-05.iad2.fedoraproject.org
  • buildhw-x86-06.iad2.fedoraproject.org
  • buildhw-x86-07.iad2.fedoraproject.org
  • buildhw-x86-08.iad2.fedoraproject.org
  • buildhw-x86-09.iad2.fedoraproject.org
  • buildhw-x86-10.iad2.fedoraproject.org
  • buildhw-x86-11.iad2.fedoraproject.org
  • buildhw-x86-12.iad2.fedoraproject.org
  • buildhw-x86-13.iad2.fedoraproject.org
  • buildhw-x86-14.iad2.fedoraproject.org
  • buildhw-x86-15.iad2.fedoraproject.org
  • buildhw-x86-16.iad2.fedoraproject.org
  • bvmhost-x86-01.iad2.fedoraproject.org
  • bvmhost-x86-01.stg.iad2.fedoraproject.org
  • bvmhost-x86-02.iad2.fedoraproject.org
  • bvmhost-x86-02.stg.iad2.fedoraproject.org
  • bvmhost-x86-03.iad2.fedoraproject.org
  • bvmhost-x86-03.stg.iad2.fedoraproject.org
  • bvmhost-x86-04.iad2.fedoraproject.org
  • bvmhost-x86-05.iad2.fedoraproject.org
  • bvmhost-x86-05.stg.iad2.fedoraproject.org
  • bvmhost-x86-06.iad2.fedoraproject.org
  • bvmhost-x86-07.iad2.fedoraproject.org
  • bvmhost-x86-08.iad2.fedoraproject.org
  • ibiblio02.fedoraproject.org
  • ibiblio05.fedoraproject.org
  • kernel01.iad2.fedoraproject.org
  • kernel02.iad2.fedoraproject.org
  • openqa-x86-worker01.iad2.fedoraproject.org
  • openqa-x86-worker02.iad2.fedoraproject.org
  • openqa-x86-worker03.iad2.fedoraproject.org
  • openqa-x86-worker04.iad2.fedoraproject.org
  • openqa-x86-worker05.iad2.fedoraproject.org
  • openqa-x86-worker06.iad2.fedoraproject.org
  • osuosl02.fedoraproject.org
  • qvmhost-x86-01.iad2.fedoraproject.org
  • qvmhost-x86-02.iad2.fedoraproject.org
  • sign-vault01.iad2.fedoraproject.org
  • sign-vault02.iad2.fedoraproject.org
  • virthost-cc-rdu02.fedoraproject.org
  • vmhost-x86-01.iad2.fedoraproject.org
  • vmhost-x86-01.stg.iad2.fedoraproject.org
  • vmhost-x86-02.iad2.fedoraproject.org
  • vmhost-x86-02.stg.iad2.fedoraproject.org
  • vmhost-x86-03.iad2.fedoraproject.org
  • vmhost-x86-04.iad2.fedoraproject.org
  • vmhost-x86-05.iad2.fedoraproject.org
  • vmhost-x86-05.stg.iad2.fedoraproject.org
  • vmhost-x86-06.iad2.fedoraproject.org
  • vmhost-x86-06.stg.iad2.fedoraproject.org
  • vmhost-x86-07.iad2.fedoraproject.org
  • vmhost-x86-07.stg.iad2.fedoraproject.org
  • vmhost-x86-08.iad2.fedoraproject.org
  • vmhost-x86-08.stg.iad2.fedoraproject.org
  • vmhost-x86-09.stg.iad2.fedoraproject.org
  • vmhost-x86-11.stg.iad2.fedoraproject.org
  • vmhost-x86-12.stg.iad2.fedoraproject.org
  • vmhost-x86-cc01.rdu-cc.fedoraproject.org
  • vmhost-x86-cc01.rdu-cc.fedoraproject.org
  • vmhost-x86-cc03.rdu-cc.fedoraproject.org
  • vmhost-x86-cc05.rdu-cc.fedoraproject.org
  • vmhost-x86-cc06.rdu-cc.fedoraproject.org
  • worker02.ocp.iad2.fedoraproject.org
  • worker04.iad2.fedoraproject.org
  • worker04-stg.ocp.iad2.fedoraproject.org
  • worker04.ocp.iad2.fedoraproject.org
  • worker05.iad2.fedoraproject.org
  • worker05.ocp.iad2.fedoraproject.org
  • worker06.ocp.iad2.fedoraproject.org

There are also few machines for COPR and I'm not sure if we should do it or just let them know about this. And few machines that don't have a hostname, so I'm not sure if they are actually used anywhere.

P.S.: I didn't know that there is a web interface for managing those machines.

Additionally we have a firmware update for the BMC in the older emag arm devices that we should apply.

I am thinking of taking this up. I suspect these all are in the intranet and would necessitate a VPN.

Metadata Update from @t0xic0der:
- Issue assigned to t0xic0der

4 months ago

So, we don't have a howto/SOP for this, but there's mentions in:

modules/sysadmin_guide/pages/failedharddrive.adoc
modules/howtos/pages/restart_datacenter_server.adoc
modules/sysadmin_guide/pages/hardware_troubleshooting_power.adoc

Basically in IAD2 all our management interfaces are in a mgmt.iad2.fedoraproject.org network ( 10.3.160.0/24 ). They can be reached directly from the RH vpn, or via noc01/batcave01 if you want to tunnel via those.

In RDU they are in a internal mgmt network thats only reachable via noc-cc01.fedoraproject.org. You have to tunnel via this vm to get to them and they are all on 172.21.2.0/24. There is no dns zone for this, but we should probibly standarize them and setup one someday.

Once you can reach a device, you need an admin passwword to login to them. We have seperate ones for staging and production.

Once you login the process is basically:

  • Get service tag
  • Enter service tag on support.dell.com site
  • Go to downloads, find the latest DRAC firmware, download it.
  • Go to firmware updates page, upload
  • set downtime in nagios so there's no alert from the next step
  • drac will reboot while upgrading, wait for it to come back.
  • confirm it's the latest version
  • profit

I would suggest stating with staging hosts to get the hang of things first.

Happy to walk through one as an example.

I tried to update rest of the servers, but encountered some issues. Here is the list:

  • buildhw-x86-01-16 (those are really old and with @kevin we decided to skip those and wait for replacement)
  • ibiblio02.fedoraproject.org - Unreachable
  • ibiblio05.fedoraproject.org - Unreachable
  • openqa-x86-worker02.mgmt.iad2.fedoraproject.org - RED008 - Unable to extract payloads from Update Package. I will try this one again later
  • osuosl02.fedoraproject.org - Unreachable
  • virthost-cc-rdu02.fedoraproject.org - Unreachable
  • vmhost-x86-07.mgmt.iad2.fedoraproject.org - Unreachable
  • vmhost-x86-cc01.rdu-cc.fedoraproject.org - Unreachable
  • vmhost-x86-cc02.rdu-cc.fedoraproject.org - Unreachable
  • vmhost-x86-cc03.rdu-cc.fedoraproject.org - Unreachable
  • vmhost-x86-cc05.rdu-cc.fedoraproject.org - Unreachable
  • vmhost-x86-cc06.rdu-cc.fedoraproject.org - Unreachable
  • worker02.ocp.mgmt.iad2.fedoraproject.org - Unreachable
  • worker04-stg.ocp.mgmt.iad2.fedoraproject.org - Unreachable
  • worker04.ocp.mgmt.iad2.fedoraproject.org - Unreachable
  • worker05.ocp.mgmt.iad2.fedoraproject.org - Unreachable
  • worker06.ocp.mgmt.iad2.fedoraproject.org - Unreachable

Other machines on the list are up to date :-)

So, many of these are not in iad2, so accessing them is different. Some are not correctly named on the list. Some are really not reachable.

ibiblio02.fedoraproject.org - this is ibiblio02-mgmt.fedoraproject.org in dns. You cannot reach it normally, you have to tunnel https through ibiblio05 (below)

ibiblio05.fedoraproject.org - This one really is unreachable. I have been working with folks there to try and fix it without much luck.

openqa-x86-worker02.mgmt.iad2.fedoraproject.org - This one is so old it's using the old SHA1 firmware and cannot read the new SHA256 ones. Probibly there is one in the middle somewhere you can upgrade to that will let you then upgrade to the latest. However, I don't think it's worth it.

osuosl02.fedoraproject.org - This one requires access to mgmt network there at osuosl which requires a openvpn vpn. I could try and add you, or I can just do the update there?

virthost-cc-rdu02.fedoraproject.org - This is the old name of vmhost-x86-cc02... it no longer exists under this name.

vmhost-x86-07.mgmt.iad2.fedoraproject.org - This host was retired/no longer exists.

vmhost-x86-cc01.rdu-cc.fedoraproject.org - Unreachable
vmhost-x86-cc02.rdu-cc.fedoraproject.org - Unreachable
vmhost-x86-cc03.rdu-cc.fedoraproject.org - Unreachable
vmhost-x86-cc05.rdu-cc.fedoraproject.org - Unreachable
vmhost-x86-cc06.rdu-cc.fedoraproject.org - Unreachable

All these can be reached via tunneling via noc-cc01. There is a docs/rdu-networks.txt file that lists the 172.21.x.x ip's for each.

These don't have '.ocp' in the name. They should just be 'worker02.mgmt.iad2.fedoraproject.org' and 'worker04-stg.mgmt.iad2.fedoraproject.org', etc

worker02.ocp.mgmt.iad2.fedoraproject.org - Unreachable 
worker04-stg.ocp.mgmt.iad2.fedoraproject.org - Unreachable
worker04.ocp.mgmt.iad2.fedoraproject.org - Unreachable
worker05.ocp.mgmt.iad2.fedoraproject.org - Unreachable
worker06.ocp.mgmt.iad2.fedoraproject.org - Unreachable

I will leave the osuosl02 to you.

I will try to update the workers and rdu-cc machines. Will see how that will go.

I finished another round of updates. I was able to update all the worker machines and few of rdu-cc machines. I still have issue with two of them:

  • vmhost-x86-cc05.mgmt.rdu-cc.fedoraproject.org - I wasn't able to even connect to this one. On the IP mentioned in the docs/rdu-networks.txt doesn't have anything listening on port 80. My assumption is that the IP is not correct as it seems off from the other machines
  • vmhost-x86-cc06.mgmt.rdu-cc.fedoraproject.org - I wasn't able to login to this one, it didn't accepted password for production nor staging

osuosl02 is up to date (it's a new machine).

The vmhost-x86-cc05/06 machines are going to be replaced next year, so I'm not sure how much we should worry about them.

I fixed the password on 06, but it ran into the 'so old it's using sha1' problem.

So, I think we can close this now?

Sounds good.

We can always reopen this one now that we have an SOP and a spreadsheet associated with this.

Metadata Update from @t0xic0der:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 months ago

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog