#259 koji builder SOP updates
Merged 7 months ago by humaton. Opened 7 months ago by kevin.
kevin/infra-docs-fpo builder-reinstalls  into  master

@@ -6,22 +6,31 @@ 

  ** <<_staging_environment>>

  ** <<_production_environment>>

  

- == Upgrades Builders

+ == Upgrading Builders

  

  It is recommended to upgrade or reinstall all builders after each release. To do it, usually the builders are upgraded and virtual machines reinstalled. First they should be upgraded in staging environment to make sure everything is working properly before starting on production.

  

+ Before starting, confirm if there is a koji package update. If there is you may need to

+ upgrade the hub/database first with the playbooks/manual/upgrade/koji.yml playbook.

+ 

  === Staging Environment

  

  Follow the process to upgrade builders in staging environment:

  

+ . Chek with others that there's not any ongoing testing in staging before starting.

  . Update the virt-install path on the builders in ansible repository, like in this https://pagure.io/fedora-infra/ansible/c/af94db8ab88588aa8dc2c20893b2487afb7f7777?branch=main[example]

- . Each builder virtual machine is removed, either via `virt-inst-dest` or manually, and the `buildvm.yml` playbook is run to deploy them. 

- . The hub is re-installed or upgraded with `dnf --refresh --releasever <release> distro-sync`.

+ . Each builder virtual machine is removed, either via `virt-inst-dest` or manually

+ . Upgrade and reboot the bvmhost that contains those builders

+ . Then `buildvm.yml` playbook is run to reinstall them. 

+ . The hub is re-installed or upgraded with `dnf --refresh --releasever <release> distro-sync`.(See note about koji / db upgrade above).

  . Test builds are done to make sure everything is working.

  

  === Production Environment

  

  For production a similar process is followed, but to avoid outages, builders are disabled, reinstalled and re-added small groups at a time. From _buildvm-s390x-01_ to _buildvm-s390x-14_ are **zvm** instances, they have to be updated with `dnf --releasever NAME distro-sync`.

+ Likewise all the _buildhw_ instances need to be upgraded as above or reinstalled with pxe

+ (out of the scope of this document).

+ 

  In production builders, are usually done per bvmhost at a time:

  

  . To do this process the user should be koji admin
@@ -37,6 +46,18 @@ 

  ....

  . In a loop check for running builds on them with `koji list-tasks --host <vmname>`

  . Either wait for all of them to finish or optionally run `koji free-task NAME` to free a task and another builder will pick it up, but this second option will restart the build and if it is a long running build it could disturb maintainers

- . Once they are all empty, destroy them as above and reinstall them with the `buildvm` playbook. Optionally, this is a good time to update the bvmhost and reboot it after destroying, but before installing builders

+ . Once they are all empty, destroy them as above

+ . Upgrade and reboot the bvmhost they are on

+ . reinstall them with the `buildvm` playbook. Optionally, this is a good time to update the bvmhost and reboot it after destroying, but before installing builders

  . After installed, check that each builder is checking in with `koji list-hosts | grep <vmname>`

  . In a loop, reenable them

+ . move on to the next group.

+ 

+ For buildhw builders:

+ 

+ . disable builder with koji `disable-host <buildhwname>`

+ . In a loop check for running builds on them with `koji list-tasks --host <vmname>`

+ . Either wait for all of them to finish or optionally run `koji free-task NAME` to free a task and another builder will pick it up, but this second option will restart the build and if it is a long running build it could disturb maintainers

+ . update with `dnf --releasever NAME distro-sync`

+ . reboot

+ . confirm the buildhw is back up and checking into koji with `koji list-hosts | grep buildhw`

Added some updates here.

There's also one bug in the buildvm playbook that I haven't been able to
fix, where if you do a install everything works until it gets to the nfs
mounts and it fails. If you do a 'nmcli c up eth0' in the vm and re-run
the playbook it works. So, something isn't updating systemd-resolved or
something. Just something to be aware of when doing this.

Signed-off-by: Kevin Fenzi kevin@scrye.com

rebased onto 6d7c76c

7 months ago

Pull-Request has been merged by humaton

7 months ago
Metadata