#8 SOP for installation of OCP4
Merged 8 months ago by dkirwan. Opened 8 months ago by dkirwan.
dkirwan/infra-docs-fpo master  into  master

@@ -0,0 +1,40 @@ 

+ == Configure Baremetal PXE-UEFI Boot

+ A high level overview of how a baremetal node in the Fedora Infra gets booted via UEFI is as follows.

+ 

+ - Server powered on

+ - Gets ip via dhcp

+ - DHCP server uses `next-server` command to point the Server to next contact the tftpboot server and retrieve `grub.cfg`

+ - tftpboot serves `grub.cfg`

+ - Sysadmin manually chooses the correct UEFI menu to boot

+ - tftpboot serves kernel and initramfs to the server

+ - Server boots with kernel and initramfs, and retrieves ingition file from `os-control01`

+ 

+ === Resources

+ 

+ - [1] https://pagure.io/fedora-infra/ansible/blob/main/f/roles/dhcp_server[Ansible Role DHCP Server]

+ - [2] https://pagure.io/fedora-infra/ansible/blob/main/f/roles/tftp_server[Ansible Role tftpboot server]

+ 

+ === UEFI Configuration

+ The configuration for UEFI booting is contained in the `grub.cfg` config which is not currently under source control. It is located on the `batcave01` at: `/srv/web/infra/bigfiles/tftpboot2/uefi/grub.cfg`.

+ 

+ The following is a sample configuration to install a baremetal OCP4 worker in the Staging cluster.

+ 

+ ----

+ menuentry 'RHCOS 4.8 worker staging' {

+   linuxefi images/RHCOS/4.8/x86_64/rhcos-4.8.2-x86_64-live-kernel-x86_64 ip=dhcp nameserver=10.3.163.33 coreos.inst.install_dev=/dev/sda coreos.live.rootfs_url=http://10.3.166.50/rhcos/rhcos-4.8.2-x86_64-live-rootfs.x86_64.img coreos.inst.ignition_url=http://10.3.166.50/rhcos/worker.ign

+   initrdefi images/RHCOS/4.8/x86_64/rhcos-4.8.2-x86_64-live-initramfs.x86_64.img

+ }

+ ----

+ 

+ Any new changes must be made here. Writing to this file requires one to be a member of the `sysadmin-main` group, so best to instead create a ticket in the Fedora Infra issue tracker with patch request. See the following https://pagure.io/fedora-infrastructure/issue/10213[PR] for inspiration.

+ 

+ === Pushing new changes out to the tftpboot server

+ To push out changes made to the `grub.cfg` the following playbook should be run, which requires `sysadmin-noc` group permissions:

+ 

+ ----

+ sudo rbac-playbook groups/noc.yml -t 'tftp_server,dhcp_server'

+ ----

+ 

+ On the `noc01` instance the `grub.cfg` file is located at `/var/lib/tftpboot/uefi/grub.cfg`

+ 

+ If particular changes to OS images for example, are required, they should be made on the `noc01` instance directly at `/var/lib/tftpboot/images/`. This will require users to be in the `sysadmin-noc` group.

@@ -0,0 +1,59 @@ 

+ == SOP Configure the Image Registry Operator

+ 

+ === Resources

+ - [1] https://docs.openshift.com/container-platform/4.8/registry/configuring_registry_storage/configuring-registry-storage-baremetal.html#configuring-registry-storage-baremetal[Configuring Registry Storage Baremetal]

+ 

+ 

+ === Enable the image registry operator

+ For detailed instructions please refer to the official documentation for the particular version of Openshift [1].

+ 

+ From the `os-control01` node we can enable the Image Registry Operator set it to a `Managed` state like so via the CLI.:

+ 

+ ----

+ oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"managementState":"Managed"}}'

+ ----

+ 

+ Next edit the configuration for the Image Registry operator like so:

+ 

+ ----

+ oc edit configs.imageregistry.operator.openshift.io

+ ----

+ 

+ Add the following to replace the `storage: {}`:

+ 

+ ----

+ ...

+ storage:

+   pvc:

+     claim:

+ ...

+ ----

+ 

+ Save the config.

+ 

+ The Image registry will automatically claim a 100G sized PV if available. It is best to open a ticket with Fedora Infra and have a 100G NFS share be created.

+ 

+ Use the following template for inspiration, populate the particular values to match the newly created NFS Share.

+ 

+ ----

+ kind: PersistentVolume

+ apiVersion: v1

+ metadata:

+   name: ocp-image-registry-volume

+ spec:

+   capacity:

+     storage: 100Gi

+   nfs:

+     server: 10.3.162.11

+     path: /ocp_prod_registry

+   accessModes:

+     - ReadWriteMany

+   persistentVolumeReclaimPolicy: Retain

+   volumeMode: Filesystem

+ ----

+ 

+ To create this new PV, create a persisent volume template file like above and apply it using the Openshift client tool like so:

+ 

+ ----

+ oc apply -f image-registry-pv.yaml

+ ----

@@ -0,0 +1,27 @@ 

+ == Configure the Local Storage Operator

+ 

+ === Resources

+ - [1] https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.7/html/deploying_openshift_container_storage_using_bare_metal_infrastructure/deploy-using-local-storage-devices-bm

+ - [2] https://github.com/centosci/ocp4-docs/blob/master/sops/localstorage/installation.md

+ 

+ 

+ === Installation

+ For installation instructions visit the official docs at: [1]. The CentOS CI SOP at [2] also has more context but it is now slightly dated.

+ 

+ - From the webconsole, click on the `Operators` option, then `OperatorHub`

+ - Search for `Local Storage`

+ - Click install

+ - Make sure the `Update Channel` matches the major.minor version of your OCP4 install

+ - Choose `A specific namespace on this cluster`

+ - Choose `Operator recommended namespace`

+ - Update approval set to automatic

+ - Click install

+ 

+ === Configuration

+ A prerequisite to this step is to have all volumes on the nodes already formatted and available prior to this step. This can be done via a machineconfig/ignition file during installation time, or alternatively SSH onto the boxes and manually create / format the volumes.

+ 

+ - Create a `LocalVolumeDiscovery` and configured it to target the disks on all nodes

+ - When that process is complete, it creates `LocalVolumeDiscoveryResult` objects which you can search the type for, then examine to see if it has found the correct disks and if they are showing as available.

+ - Create a `LocalVolumeSet`: name `local-block` storage class `local-block` type all, devicetypes disk, part, filter disks by, choose the selected nodes worker01-03, volume mode block. Create.

+ - After a period of time check the newly created LocalVolumeSet `local-block` object's yaml definition, it should show the correct number of volumes listed in the `totalProvisionedDeviceCount` field.

+ 

@@ -0,0 +1,48 @@ 

+ == SOP Configure oauth Authentication via IPA/Noggin

+ 

+ 

+ === Resources

+ 

+ - [1] https://pagure.io/fedora-infra/ansible/blob/main/f/files/communishift/objects[Example Config from Communishift]

+ 

+ 

+ === OIDC Setup

+ The first step is to request that a secret be created for this environment, please open a ticket with Fedora Infra. Once the secret has been made available we can add it to an Openshift Secret in the cluster like so:

+ 

+ ----

+ oc create secret generic fedoraidp-clientsecret --from-literal=clientSecret=<client-secret> -n openshift-config

+ ----

+ 

+ Next we can update the oauth configuration on the cluster and add the config for ipa/noggin/ipsilon. See the following snippet for inspiration:

+ 

+ ----

+ apiVersion: config.openshift.io/v1

+ kind: OAuth

+ metadata:

+   name: cluster

+ spec:

+   identityProviders:

+ ...

+   - name: fedoraidp

+     login: true

+     challenge: false

+     mappingMethod: claim

+     type: OpenID

+     openID:

+       clientID: ocp

+       clientSecret:

+         name: fedoraidp-clientsecret

+       extraScopes:

+       - email

+       - profile

+       claims:

+         preferredUsername:

+         - nickname

+         name:

+         - name

+         email:

+         - email

+       issuer: https://id.fedoraproject.org

+ ----

+ 

+ This config already exists in the cluster so you need to edit or patch it, you can't just `oc apply -f template.yaml`.

@@ -0,0 +1,37 @@ 

+ == Configure the Openshift Container Storage Operator

+ 

+ 

+ === Resources

+ 

+ - [1] https://docs.openshift.com/container-platform/4.8/storage/persistent_storage/persistent-storage-ocs.html[Official Docs]

+ - [2] https://github.com/red-hat-storage/ocs-operator[Github]

+ 

+ === Installation

+ Important: before following this SOP, please ensure that you have already followed the SOP to install the Local Storage Operator first, as this is a requirement for the OCS operator.

+ 

+ For full detailed instructions please refer to the official docs at: [1]. For general instructions see below:

+ 

+ - In the webconsole, click the Operators menu

+ - Click the OperatorHub menu

+ - Search for `OpenShift Container Storage`

+ - Click install

+ - Choose the update channel to match the major.minor version of the cluster itself.

+ - Installation mode, A specified namespace on the cluster

+ - Installed namespace, Operator Recommended

+ - Update approval, automatic

+ - Click install

+ 

+ 

+ === Configuration

+ When the operator is finished installing, we can continue, please ensure that a minimum of 3 nodes are available.

+ 

+ - A `StorageCluster` is required to complete this installation, click the Create StorageCluster.

+ - At the top, choose the `internal - attached devices` mode.

+ - In the storageclass choose the `local-block` from the list.

+ - The compute/worker nodes with available storage appear in the list

+ - It automatically calculates the possible storage amount

+ - Click next

+ - On the `Security and Network` section just click next.

+ - Click create.

+ 

+ 

@@ -0,0 +1,56 @@ 

+ == Installation of the Openshift Virtualisation Operator

+ 

+ === Resources

+ - [1] https://alt.fedoraproject.org/cloud/[Fedora Images]

+ - [2] https://github.com/kubevirt/kubevirt/blob/main/docs/container-register-disks.md[Kubevirt Importing Containers of VMI Images]

+ 

+ 

+ === Installation

+ From the web console, choose the `Operators` menu, and choose `OperatorHub`.

+ 

+ Search for `Openshift Virtualization`

+ 

+ Click install.

+ 

+ When the installation of the Operator is completed, create a `HyperConverged` object and follow the wizard, the default options should be fine, click next through the menus.

+ 

+ Next create a `HostPathProvisioner` object the default options should be fine, click next through the menus.

+ 

+ 

+ === Verification

+ To verify that the installation of the Operator is successful, we can attempt to create a VM.

+ 

+ From the [1] location download the Fedora34 `Cloud Base image for Openstack` image with the `qcow2` format locally.

+ 

+ Create a `Dockerfile` with the following contents:

+ 

+ ----

+ FROM scratch

+ ADD fedora34.qcow2 /disk/

+ ----

+ 

+ Build the contianer:

+ 

+ ----

+ podman build -t fedora34:latest .

+ ----

+ 

+ Push the container to your username at quay.io.

+ 

+ ----

+ podman push quay.io/<USER>/fedora34:latest

+ ----

+ 

+ In the web console, visit the Workloads, then Virtualization menu.

+ 

+ Create a VirtualMachine with Wizard

+ 

+ Choose Fedora and click next

+ 

+ From the boot source dropdown menu, select import via Registry

+ 

+ In the container image, you can add the one prepared earlier. eg `quay.io/dkirwan/fedora34`

+ 

+ Click the `Advanced Storage settings`, change the storageclass to `oc-storagecluster-ceph-rbd` and click next and done.

+ 

+ Once the VM is created and booted, the console is available from the top right drop down menu.

@@ -0,0 +1,95 @@ 

+ == Enable User Workload Monitoring Stack

+ 

+ === Resources

+ - [1] https://docs.openshift.com/container-platform/4.8/monitoring/enabling-monitoring-for-user-defined-projects.html[Official Docs]

+ - [2] https://docs.openshift.com/container-platform/4.8/monitoring/enabling-monitoring-for-user-defined-projects.html#granting-users-permission-to-monitor-user-defined-projects_enabling-monitoring-for-user-defined-projects[Providing Access to the UWMS features]

+ - [3] https://docs.openshift.com/container-platform/4.8/monitoring/enabling-monitoring-for-user-defined-projects.html#granting-user-permissions-using-the-web-console_enabling-monitoring-for-user-defined-projects[Providing Access to the UWMS dashboard]

+ - [4] https://docs.openshift.com/container-platform/4.8/monitoring/configuring-the-monitoring-stack.html#configuring-persistent-storage[Configure Monitoring Stack]

+ 

+ === Configuration

+ To enable the stack edit the `cluster-monitoring` ConfigMap like so:

+ 

+ ----

+ oc -n openshift-monitoring edit configmap cluster-monitoring-config

+ ----

+ 

+ Set the `enableUserWorkload` to `true` like so:

+ 

+ ----

+ apiVersion: v1

+ kind: ConfigMap

+ metadata:

+   name: cluster-monitoring-config

+   namespace: openshift-monitoring

+ data:

+   config.yaml: |

+     enableUserWorkload: true

+     prometheusK8s:

+       retention: 30d

+       volumeClaimTemplate:

+         spec:

+           storageClassName: ocs-storagecluster-ceph-rbd

+           resources:

+             requests:

+               storage: 100Gi

+     alertmanagerMain:

+       volumeClaimTemplate:

+         spec:

+           storageClassName: ocs-storagecluster-ceph-rbd

+           resources:

+             requests:

+               storage: 50Gi

+ ----

+ 

+ Save the configmap changes. Monitor the rollout progress of the User Workload Monitoring Stack with the following:

+ 

+ ----

+ oc -n openshift-user-workload-monitoring get pod

+ NAME                                   READY   STATUS        RESTARTS   AGE

+ prometheus-operator-6f7b748d5b-t7nbg   2/2     Running       0          3h

+ prometheus-user-workload-0             4/4     Running       1          3h

+ prometheus-user-workload-1             4/4     Running       1          3h

+ thanos-ruler-user-workload-0           3/3     Running       0          3h

+ thanos-ruler-user-workload-1           3/3     Running       0          3h

+ ----

+ 

+ At this point we can create a `ConfigMap` to configure the User Workload Monitoring stack in the `openshift-user-workload-monitoring` namespace.

+ 

+ ----

+ oc create configmap user-workload-monitoring-config -n openshift-user-workload-monitoring

+ ----

+ 

+ Then edit this ConfigMap:

+ 

+ ----

+ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-config

+ ----

+ 

+ Save the following configuration

+ 

+ ----

+ apiVersion: v1

+ kind: ConfigMap

+ metadata:

+   name: user-workload-monitoring-config

+   namespace: openshift-user-workload-monitoring

+ data:

+   config.yaml: |

+     prometheus:

+       retention: 30d

+       volumeClaimTemplate:

+         spec:

+           storageClassName: ocs-storagecluster-ceph-rbd

+           resources:

+             requests:

+               storage: 100Gi

+     thanosRuler:

+       volumeClaimTemplate:

+         spec:

+           storageClassName: ocs-storagecluster-ceph-rbd

+           resources:

+             requests:

+               storage: 50Gi

+ ----

+ 

+ To provide access to users to create `PrometheusRule` and `ServiceMonitor` and `PodMonitor` objects see [2]. To allow access to the User Workload Monitoring Stack dashboard see [3].

@@ -0,0 +1,56 @@ 

+ == Cordoning Nodes and Draining Pods

+ This SOP should be followed in the following scenarios:

+ 

+ - If maintenance is scheduled to be carried out on an Openshift node.

+ 

+ 

+ === Steps

+ 

+ 1. Connect to the `os-control01` host associated with this ENV. Become root `sudo su -`.

+ 

+ 2. Mark the node as unschedulable:

+ 

+ ----

+ nodes=$(oc get nodes -o name  | sed -E "s/node\///")

+ echo $nodes

+ 

+ for node in ${nodes[@]}; do oc adm cordon $node; done

+ node/<node> cordoned

+ ----

+ 

+ 3. Check that the node status is `NotReady,SchedulingDisabled`

+ 

+ ----

+ oc get node <node1>

+ NAME        STATUS                        ROLES     AGE       VERSION

+ <node1>     NotReady,SchedulingDisabled   worker    1d        v1.18.3

+ ----

+ 

+ Note: It might not switch to `NotReady` immediately, there maybe many pods still running.

+ 

+ 

+ 4. Evacuate the Pods from **worker nodes** using one of the following methods

+ This will drain node `<node1>`, delete any local data, and ignore daemonsets, and give a period of 60 seconds for pods to drain gracefully.

+ 

+ ----

+ oc adm drain <node1> --delete-emptydir-data=true --ignore-daemonsets=true --grace-period=15

+ ----

+ 

+ 5. Perform the scheduled maintenance on the node

+ Do what ever is required in the scheduled maintenance window

+ 

+ 

+ 6. Once the node is ready to be added back into the cluster

+ We must uncordon the node. This allows it to be marked scheduleable once more.

+ 

+ ----

+ nodes=$(oc get nodes -o name  | sed -E "s/node\///")

+ echo $nodes

+ 

+ for node in ${nodes[@]}; do oc adm uncordon $node; done

+ ----

+ 

+ 

+ ===  Resources

+ 

+ - [1] [Nodes - working with nodes](https://docs.openshift.com/container-platform/4.8/nodes/nodes/nodes-nodes-working.html)

@@ -0,0 +1,38 @@ 

+ == Create MachineConfigs to Configure RHCOS

+ 

+ === Resources

+ 

+ - [1] https://coreos.github.io/butane/getting-started/[Butane Getting Started]

+ - [2] https://docs.openshift.com/container-platform/4.8/post_installation_configuration/machine-configuration-tasks.html#installation-special-config-chrony_post-install-machine-configuration-tasks[OCP4 Post Installation Configuration]

+ 

+ === Butane

+ "Butane (formerly the Fedora CoreOS Config Transpiler) is a tool that consumes a Butane Config and produces an Ignition Config, which is a JSON document that can be given to a Fedora CoreOS machine when it first boots." [1]

+ 

+ Butane is available in a container image, we can pull the latest version locally like so:

+ 

+ ----

+ # Pull the latest release

+ podman pull quay.io/coreos/butane:release

+ 

+ # Run butane using standard in and standard out

+ podman run -i --rm quay.io/coreos/butane:release --pretty --strict < your_config.bu > transpiled_config.ign

+ 

+ # Run butane using files.

+ podman run --rm -v /path/to/your_config.bu:/config.bu:z quay.io/coreos/butane:release --pretty --strict /config.bu > transpiled_config.ign

+ ----

+ 

+ We can create a CLI alias to make running the Butane container much easier like so:

+ 

+ ----

+ alias butane='podman run --rm --tty --interactive \

+               --security-opt label=disable        \

+               --volume ${PWD}:/pwd --workdir /pwd \

+               quay.io/coreos/butane:release'

+ ----

+ 

+ For more detailed information on how to structure your Butane file see [1]. Once created you can convert the butane config to an igntion file like so:

+ 

+ ----

+ butane master_chrony_machineconfig.bu -o master_chrony_machineconfig.yaml

+ butane worker_chrony_machineconfig.bu -o worker_chrony_machineconfig.yaml

+ ----

@@ -0,0 +1,70 @@ 

+ == SOP Disable `self-provisioners` Role

+ 

+ === Resources

+ 

+ - [1] https://docs.openshift.com/container-platform/4.4/applications/projects/configuring-project-creation.html#disabling-project-self-provisioning_configuring-project-creation

+ 

+ 

+ === Disabling self-provisioners role

+ By default, when a user authenticates with Openshift via Oauth, it is part of the `self-provisioners` group. This group provides the ability to create new projects. On CentOS CI we do not want users to be able to create their own projects, as we have a system in place where we create a project and control the administrators of that project.

+ 

+ To disable the self-provisioner role do the following as outlined in the documentation[1].

+ 

+ ----

+ oc describe clusterrolebinding.rbac self-provisioners

+ 

+ Name:		self-provisioners

+ Labels:		<none>

+ Annotations:	rbac.authorization.kubernetes.io/autoupdate=true

+ Role:

+   Kind:	ClusterRole

+   Name:	self-provisioner

+ Subjects:

+   Kind	Name				Namespace

+   ----	----				---------

+   Group	system:authenticated:oauth

+ ----

+ 

+ Remove the subjects that the self-provisioners role applies to.

+ 

+ ----

+ oc patch clusterrolebinding.rbac self-provisioners -p '{"subjects": null}'

+ ----

+ 

+ Verify the change occurred successfully

+ 

+ ----

+ oc describe clusterrolebinding.rbac self-provisioners

+ Name:         self-provisioners

+ Labels:       <none>

+ Annotations:  rbac.authorization.kubernetes.io/autoupdate: true

+ Role:

+   Kind:  ClusterRole

+   Name:  self-provisioner

+ Subjects:

+   Kind  Name  Namespace

+   ----  ----  ---------

+ ----

+ 

+ When the cluster is updated to a new version, unless we mark the role appropriately, the permissions will be restored after the update is complete.

+ 

+ Verify that the value is currently set to be restored after an update:

+ 

+ ----

+ oc get clusterrolebinding.rbac self-provisioners -o yaml

+ ----

+ 

+ ----

+ apiVersion: authorization.openshift.io/v1

+ kind: ClusterRoleBinding

+ metadata:

+   annotations:

+     rbac.authorization.kubernetes.io/autoupdate: "true"

+   ...

+ ----

+ 

+ We wish to set this `rbac.authorization.kubernetes.io/autoupdate` to `false`. To patch this do the following.

+ 

+ ----

+ oc patch clusterrolebinding.rbac self-provisioners -p '{ "metadata": { "annotations": { "rbac.authorization.kubernetes.io/autoupdate": "false" } } }'

+ ----

@@ -0,0 +1,50 @@ 

+ == Create etcd backup

+ This SOP should be followed in the following scenarios:

+ 

+ - When the need exists to create an etcd backup.

+ - When shutting a cluster down gracefully.

+ 

+ === Resources

+ 

+ - [1] https://docs.openshift.com/container-platform/4.8/backup_and_restore/backing-up-etcd.html[Creating an etcd backup]

+ 

+ === Take etcd backup

+ 

+ 1. Connect to the `os-control01` node associated with the ENV.

+ 

+ 2. Use the `oc` tool to make a debug connection to a controlplane node

+ 

+ ----

+ oc debug node/<node_name>

+ ----

+ 

+ 3. Chroot to the /host directory on the containers filesystem

+ 

+ ----

+ sh-4.2# chroot /host

+ ----

+ 

+ 4. Run the cluster-backup.sh script and pass in the location to save the backup to

+ 

+ ----

+ sh-4.4# /usr/local/bin/cluster-backup.sh /home/core/assets/backup

+ ----

+ 

+ 5. Chown the backup files to be owned by user `core` and group `core`

+ 

+ ----

+ chown -R core:core /home/core/assets/backup

+ ----

+ 

+ 6. From the admin machine, see inventory group: `ocp-ci-management`, become the Openshift service account, see the inventory hostvars for the host identified in the previous step and note the `ocp_service_account` variable.

+ 

+ ----

+ ssh <host>

+ sudo su - <ocp_service_account>

+ ----

+ 

+ 7. Copy the files down to the `os-control01` machine.

+ 

+ ----

+ scp -i <ssh_key> core@<node_name>:/home/core/assets/backup/* ocp_backups/

+ ----

@@ -0,0 +1,30 @@ 

+ == Graceful Shutdown of an Openshift 4 Cluster

+ This SOP should be followed in the following scenarios:

+ 

+ - Graceful full shut down of the Openshift 4 cluster is required.

+ 

+ === Steps

+ 

+ Prequisite steps:

+ - Follow the SOP for cordoning and draining the nodes.

+ - Follow the SOP for creating an `etcd` backup.

+ 

+ 

+ 1. Connect to the `os-control01` host associated with this ENV. Become root `sudo su -`.

+ 

+ 2. Get a list of the nodes

+ 

+ ----

+ nodes=$(oc get nodes -o name  | sed -E "s/node\///")

+ ----

+ 

+ 3. Shutdown the nodes from the administration box associated with the cluster `ENV` eg production/staging.

+ 

+ ----

+ for node in ${nodes[@]}; do ssh -i /root/ocp4/ocp-<ENV>/ssh/id_rsa core@$node sudo shutdown -h now; done

+ ----

+ 

+ 

+ ==== Resources

+ 

+ - [1] https://docs.openshift.com/container-platform/4.5/backup_and_restore/graceful-cluster-shutdown.html[Graceful Cluster Shutdown]

@@ -0,0 +1,88 @@ 

+ == Graceful Startup of an Openshift 4 Cluster

+ This SOP should be followed in the following scenarios:

+ 

+ - Graceful start up of an Openshift 4 cluster.

+ 

+ === Steps

+ Prequisite steps:

+ 

+ 

+ ==== Start the VM Control Plane instances

+ Ensure that the control plane instances start first.

+ 

+ ----

+ # Virsh command to start the VMs

+ ----

+ 

+ 

+ ==== Start the physical nodes

+ To connect to `idrac`, you must be connected to the Red Hat VPN. Next find the management IP associated with each node.

+ 

+ On the `batcave01` instance, in the dns configuration, the following bare metal machines make up the production/staging OCP4 worker nodes.

+ 

+ ----

+ oshift-dell01             IN        A     10.3.160.180  # worker01 prod

+ oshift-dell02             IN        A     10.3.160.181  # worker02 prod

+ oshift-dell03             IN        A     10.3.160.182  # worker03 prod

+ oshift-dell04             IN        A     10.3.160.183  # worker01 staging

+ oshift-dell05             IN        A     10.3.160.184  # worker02 staging

+ oshift-dell06             IN        A     10.3.160.185  # worker03 staging

+ ----

+ 

+ Login to the `idrac` interface that corresponds with each worker, one at a time. Ensure the node is booting via harddrive, then power it on.

+ 

+ ==== Once the nodes have been started they must be uncordoned if appropriate

+ 

+ ----

+ oc get nodes

+ NAME                       STATUS                     ROLES    AGE    VERSION

+ dumpty-n1.ci.centos.org    Ready,SchedulingDisabled   worker   77d    v1.18.3+6c42de8

+ dumpty-n2.ci.centos.org    Ready,SchedulingDisabled   worker   77d    v1.18.3+6c42de8

+ dumpty-n3.ci.centos.org    Ready,SchedulingDisabled   worker   77d    v1.18.3+6c42de8

+ dumpty-n4.ci.centos.org    Ready,SchedulingDisabled   worker   77d    v1.18.3+6c42de8

+ dumpty-n5.ci.centos.org    Ready,SchedulingDisabled   worker   77d    v1.18.3+6c42de8

+ kempty-n10.ci.centos.org   Ready,SchedulingDisabled   worker   106d   v1.18.3+6c42de8

+ kempty-n11.ci.centos.org   Ready,SchedulingDisabled   worker   106d   v1.18.3+6c42de8

+ kempty-n12.ci.centos.org   Ready,SchedulingDisabled   worker   106d   v1.18.3+6c42de8

+ kempty-n6.ci.centos.org    Ready,SchedulingDisabled   master   106d   v1.18.3+6c42de8

+ kempty-n7.ci.centos.org    Ready,SchedulingDisabled   master   106d   v1.18.3+6c42de8

+ kempty-n8.ci.centos.org    Ready,SchedulingDisabled   master   106d   v1.18.3+6c42de8

+ kempty-n9.ci.centos.org    Ready,SchedulingDisabled   worker   106d   v1.18.3+6c42de8

+ 

+ nodes=$(oc get nodes -o name  | sed -E "s/node\///")

+ 

+ for node in ${nodes[@]}; do oc adm uncordon $node; done

+ node/dumpty-n1.ci.centos.org uncordoned

+ node/dumpty-n2.ci.centos.org uncordoned

+ node/dumpty-n3.ci.centos.org uncordoned

+ node/dumpty-n4.ci.centos.org uncordoned

+ node/dumpty-n5.ci.centos.org uncordoned

+ node/kempty-n10.ci.centos.org uncordoned

+ node/kempty-n11.ci.centos.org uncordoned

+ node/kempty-n12.ci.centos.org uncordoned

+ node/kempty-n6.ci.centos.org uncordoned

+ node/kempty-n7.ci.centos.org uncordoned

+ node/kempty-n8.ci.centos.org uncordoned

+ node/kempty-n9.ci.centos.org uncordoned

+ 

+ oc get nodes

+ NAME                       STATUS   ROLES    AGE    VERSION

+ dumpty-n1.ci.centos.org    Ready    worker   77d    v1.18.3+6c42de8

+ dumpty-n2.ci.centos.org    Ready    worker   77d    v1.18.3+6c42de8

+ dumpty-n3.ci.centos.org    Ready    worker   77d    v1.18.3+6c42de8

+ dumpty-n4.ci.centos.org    Ready    worker   77d    v1.18.3+6c42de8

+ dumpty-n5.ci.centos.org    Ready    worker   77d    v1.18.3+6c42de8

+ kempty-n10.ci.centos.org   Ready    worker   106d   v1.18.3+6c42de8

+ kempty-n11.ci.centos.org   Ready    worker   106d   v1.18.3+6c42de8

+ kempty-n12.ci.centos.org   Ready    worker   106d   v1.18.3+6c42de8

+ kempty-n6.ci.centos.org    Ready    master   106d   v1.18.3+6c42de8

+ kempty-n7.ci.centos.org    Ready    master   106d   v1.18.3+6c42de8

+ kempty-n8.ci.centos.org    Ready    master   106d   v1.18.3+6c42de8

+ kempty-n9.ci.centos.org    Ready    worker   106d   v1.18.3+6c42de8

+ ----

+ 

+ 

+ === Resources

+ 

+ - [1] https://docs.openshift.com/container-platform/4.5/backup_and_restore/graceful-cluster-restart.html[Graceful Cluster Startup]

+ - [2] https://docs.openshift.com/container-platform/4.5/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html#dr-restoring-cluster-state[Cluster disaster recovery]

@@ -0,0 +1,215 @@ 

+ == SOP Installation/Configuration of OCP4 on Fedora Infra

+ 

+ === Resources

+ 

+ - [1]: https://docs.openshift.com/container-platform/4.8/installing/installing_bare_metal/[Official OCP4 Installation Documentation]

+ 

+ === Install

+ To install OCP4 on Fedora Infra, one must be apart of the following groups:

+ 

+ - `sysadmin-openshift`

+ - `sysadmin-noc`

+ 

+ 

+ ==== Prerequisites

+ Visit the https://console.redhat.com/openshift/install/metal/user-provisioned[OpenShift Console] and download the following OpenShift tools:

+ 

+ * A Red Hat Access account is required

+ * OC client tools https://access.redhat.com/downloads/content/290/ver=4.8/rhel---8/4.8.10/x86_64/product-software[Here]

+ * OC installation tool https://access.redhat.com/downloads/content/290/ver=4.8/rhel---8/4.8.10/x86_64/product-software[Here]

+ * Ensure the downloaded tools are available on the `PATH`

+ * A valid OCP4 subscription is required to complete the installation configuration, by default you have a 60 day trial.

+ * Take a copy of your pull secret file you will need to put this in the `install-config.yaml` file in the next step.

+ 

+ 

+ ==== Generate install-config.yaml file

+ We must create a `install-config.yaml` file, use the following example for inspiration, alternatively refer to the documentation[1] for more detailed information/explainations.

+ 

+ ----

+ apiVersion: v1

+ baseDomain: stg.fedoraproject.org

+ compute:

+ - hyperthreading: Enabled

+   name: worker

+   replicas: 0

+ controlPlane:

+   hyperthreading: Enabled

+   name: master

+   replicas: 3

+ metadata:

+   name: 'ocp'

+ networking:

+   clusterNetwork:

+   - cidr: 10.128.0.0/14

+     hostPrefix: 23

+   networkType: OpenShiftSDN

+   serviceNetwork:

+   - 172.30.0.0/16

+ platform:

+   none: {}

+ fips: false

+ pullSecret: 'PUT PULL SECRET HERE'

+ sshKey: 'PUT SSH PUBLIC KEY HERE kubeadmin@core'

+ ----

+ 

+ * Login to the `os-control01` corresponding with the environment

+ * Make a directory to hold the installation files: `mkdir ocp4-<ENV>`

+ * Enter this newly created directory: `cd ocp4-<ENV>`

+ * Generate a fresh SSH keypair: `ssh-keygen -f ./ocp4-<ENV>-ssh`

+ * Create a `ssh` directory and place this keypair into it.

+ * Put the contents of the public key in the `sshKey` value in the `install-config.yaml` file

+ * Put the contents of your Pull Secret in the `pullSecret` value in the `install-config.yaml`

+ * Take a backup of the `install-config.yaml` to `install-config.yaml.bak`, as running the next steps consumes this file, having a backup allows you to recover from mistakes quickly.

+ 

+ 

+ ==== Create the Installation Files

+ Using the `openshift-install` tool we can generate the installation files. Make sure that the `install-config.yaml` file is in the `/path/to/ocp4-<ENV>` location before attempting the next steps.

+ 

+ ===== Create the Manifest Files

+ The manifest files are human readable, at this stage you can put any customisations required before the installation begins.

+ 

+ * Create the manifests: `openshift-install create manifests --dir=/path/to/ocp4-<ENV>`

+ * All configuration for RHCOS must be done via MachineConfigs configuration. If there is known configuration which must be performed, such as NTP, you can copy the MachineConfigs into the `/path/to/ocp4-<ENV>/openshift` directory now.

+ * The following step should be performed at this point, edit the `/path/to/ocp4-<ENV>/manifests/cluster-scheduler-02-config.yml` change the `mastersSchedulable` value to `false`.

+ 

+ 

+ ===== Create the Ignition Files

+ The ignition files have been generated from the manifests and MachineConfig files to generate the final installation files for the three roles: `bootstrap`, `master`, `worker`. In Fedora we prefer not to use the term `master` here, we have renamed this role to `controlplane`.

+ 

+ * Create the ignition files: `openshift-install create ignition-configs --dir=/path/to/ocp4-<ENV>`

+ * At this point you should have the following three files: `bootstrap.ign`, `master.ign` and `worker.ign`.

+ * Rename the `master.ign` to `controlplane.ign`.

+ * A directory has been created, `auth`. This contains two files: `kubeadmin-password` and `kubeconfig`. These allow `cluster-admin` access to the cluster.

+ 

+ 

+ ==== Copy the Ignition files to the `batcave01` server

+ On the `batcave01` at the following location: `/srv/web/infra/bigfiles/openshiftboot/`:

+ 

+ * Create a directory to match the environment: `mkdir /srv/web/infra/bigfiles/openshiftboot/ocp4-<ENV>`

+ * Copy the ignition files, the ssh files and the auth files generated in previous steps, to this newly created directory. Users with `sysadmin-openshift` should have the necessary permissions to write to this location.

+ * when this is complete it should look like the following:

+ ----

+     ├── <ENV>

+     │   ├── auth

+     │   │   ├── kubeadmin-password

+     │   │   └── kubeconfig

+     │   ├── bootstrap.ign

+     │   ├── controlplane.ign

+     │   ├── ssh

+     │   │   ├── id_rsa

+     │   │   └── id_rsa.pub

+     │   └── worker.ign

+ ----

+ 

+ 

+ ==== Update the ansible inventory

+ The ansible inventory/hostvars/group vars should be updated with the new hosts information.

+ 

+ For inspiration see the following https://pagure.io/fedora-infra/ansible/pull-request/765[PR] where we added the ocp4 production changes.

+ 

+ 

+ ==== Update the DNS/DHCP configuration

+ The DNS and DHCP configuration must also be updated. This https://pagure.io/fedora-infra/ansible/pull-request/765[PR] contains the necessiary changes DHCP for prod and can be done in ansible.

+ 

+ However the DNS changes may only be performed by `sysadmin-main`. For this reason any DNS changes must go via a patch snippet which is emailed to the `infrastructure@lists.fedoraproject.org` mailing list for review and approval. This process may take several days.

+ 

+ 

+ ==== Generate the TLS Certs for the new environment

+ This is beyond the scope of this SOP, the best option is to create a ticket for Fedora Infra to request that these certs are created and available for use. The following certs should be available:

+ 

+ - `*.apps.<ENV>.fedoraproject.org`

+ - `api.<ENV>.fedoraproject.org`

+ - `api-int.<ENV>.fedoraproject.org`

+ 

+ 

+ ==== Run the Playbooks

+ There are a number of playbooks required to be run. Once all the previous steps have been reached, we can run these playbooks from the `batcave01` instance.

+ 

+ - `sudo rbac-playbook groups/noc.yml -t 'tftp_server,dhcp_server'`

+ - `sudo rbac-playbook groups/proxies.yml -t 'haproxy,httpd,iptables'`

+ 

+ 

+ ===== Baremetal / VMs

+ Depending on if some of the nodes are VMs or baremetal, different tags should be supplied to the following playbook. If the entire cluster is baremetal you can skip the `kvm_deploy` tag entirely.

+ 

+ If there are VMs used for some of the roles, make sure to leave it in.

+ 

+ - `sudo rbac-playbook manual/ocp4-place-ignitionfiles.yml -t "ignition,repo,kvm_deploy"`

+ 

+ 

+ ===== Baremetal

+ At this point we can switch on the baremetal nodes and begin the PXE/UEFI boot process. The baremetal nodes should via DHCP/DNS have the configuration necessary to reach out to the `noc01.iad2.fedoraproject.org` server and retrieve the UEFI boot configuration via PXE.

+ 

+ Once booted up, you should visit the management console for this node, and manually choose the UEFI configuration appropriate for its role.

+ 

+ The node will begin booting, and during the boot process it will reach out to the `os-control01` instance specific to the `<ENV>` to retrieve the ignition file appropriate to its role.

+ 

+ The system will then become autonomous, it will install and potentially reboot multiple times as updates are retrieved/applied etc.

+ 

+ Eventually you will be presented with a SSH login prompt, where it should have the correct hostname eg: `ocp01` to match what is in the DNS configuration.

+ 

+ 

+ ==== Bootstrapping completed

+ When the control plane is up, we should see all controlplane instances available in the appropriate haproxy dashboard. eg: https://admin.fedoraproject.org/haproxy/proxy01=ocp-masters-backend-kapi[haproxy].

+ 

+ At this time we should take the `bootstrap` instance out of the haproxy load balancer.

+ 

+ - Make the necessiary changes to ansible at: `ansible/roles/haproxy/templates/haproxy.cfg`

+ - Once merged, run the following playbook once more: `sudo rbac-playbook groups/proxies.yml -t 'haproxy'`

+ 

+ 

+ ==== Begin instllation of the worker nodes

+ Follow the same processes listed in the Baremetal section above to switch on the worker nodes and begin installation.

+ 

+ 

+ ==== Configure the `os-control01` to authenticate with the new OCP4 cluster

+ Copy the `kubeconfig` to `~root/.kube/config` on the `os-control01` instance.

+ This will allow the `root` user to automatically be authenticated to the new OCP4 cluster with `cluster-admin` privileges.

+ 

+ 

+ ==== Accept Node CSR Certs

+ To accept the worker/compute nodes into the cluster we need to accept their CSR certs.

+ 

+ List the CSR certs. The ones we're interested in will show as pending:

+ 

+ ----

+ oc get csr

+ ----

+ 

+ To accept all the OCP4 node CSRs in a one liner do the following:

+ 

+ ----

+ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve

+ ----

+ 

+ This should look something like this once completed:

+ 

+ ----

+ [root@os-control01 ocp4][STG]= oc get nodes

+ NAME                                      STATUS   ROLES    AGE   VERSION

+ ocp01.ocp.stg.iad2.fedoraproject.org      Ready    master   34d   v1.21.1+9807387

+ ocp02.ocp.stg.iad2.fedoraproject.org      Ready    master   34d   v1.21.1+9807387

+ ocp03.ocp.stg.iad2.fedoraproject.org      Ready    master   34d   v1.21.1+9807387

+ worker01.ocp.stg.iad2.fedoraproject.org   Ready    worker   21d   v1.21.1+9807387

+ worker02.ocp.stg.iad2.fedoraproject.org   Ready    worker   20d   v1.21.1+9807387

+ worker03.ocp.stg.iad2.fedoraproject.org   Ready    worker   20d   v1.21.1+9807387

+ worker04.ocp.stg.iad2.fedoraproject.org   Ready    worker   34d   v1.21.1+9807387

+ worker05.ocp.stg.iad2.fedoraproject.org   Ready    worker   34d   v1.21.1+9807387

+ ----

+ 

+ At this point the cluster is basically up and running.

+ 

+ 

+ === Follow on SOPs

+ Several other SOPs should be followed to perform the post installation configuration on the cluster.

+ 

+ - xref:sop_configure_baremetal_pxe_uefi_boot.adoc[SOP Configure Baremetal PXE-UEFI Boot]

+ - xref:sop_create_machineconfigs.adoc[SOP Create MachineConfigs to Configure RHCOS]

+ - xref:sop_retrieve_ocp4_cacert.adoc[SOP Retrieve OCP4 CACERT]

+ - xref:sop_configure_image_registry_operator.adoc[SOP Configure the Image Registry Operator]

+ - xref:sop_disable_provisioners_role.adoc[SOP Disable the Provisioners Role]

+ - xref:sop_configure_oauth_ipa.adoc[SOP Configure oauth Authentication via IPA/Noggin]

+ - xref:sop_configure_local_storage_operator.adoc[SOP Configure the Local Storage Operator]

+ - xref:sop_configure_openshift_container_storage.adoc[SOP Configure the Openshift Container Storage Operator]

+ - xref:sop_configure_userworkload_monitoring_stack.adoc[SOP Configure the Userworkload Monitoring Stack]

+ 

@@ -0,0 +1,22 @@ 

+ == SOP Retrieve OCP4 Cluster CACERT

+ 

+ === Resources

+ 

+ - [1] https://pagure.io/fedora-infra/ansible/blob/main/f/roles/dhcp_server[Ansible Role DHCP Server]

+ 

+ === Retrieve CACERT

+ In Fedora Infra, we have Apache terminating TLS for the cluster. Connections to the api and the machineconfig server are handled by haproxy. To prevent TLS errors we must configure haproxy with the OCP4 Cluster CA Cert.

+ 

+ This can be retrieved once the cluster control plane has been installed, from the `os-control01` node like so:

+ 

+ ----

+ oc get configmap kube-root-ca.crt -o yaml -n openshift-ingress

+ ----

+ 

+ Extract this CACERT in full, and commit it to ansible at: `https://pagure.io/fedora-infra/ansible/blob/main/f/roles/haproxy/files/ocp.<ENV>-iad2.pem`

+ 

+ To deploy this cert, one must be apart of the `sysadmin-noc` group. Run the following playbook:

+ 

+ ----

+ sudo rbac-playbook groups/proxies.yml -t 'haproxy'

+ ----

@@ -0,0 +1,37 @@ 

+ == Upgrade OCP4 Cluster

+ Please see the official documentation for more information [1][3], this SOP can be used as a rough guide.

+ 

+ === Resources

+ 

+ - [1] https://docs.openshift.com/container-platform/4.8/updating/updating-cluster-between-minor.html[Upgrading OCP4 Cluster Between Minor Versions]

+ - [2] xref:sop_etcd_backup.adoc[SOP Create etcd backup]

+ - [3] https://docs.openshift.com/container-platform/4.8/operators/admin/olm-upgrading-operators.html

+ - [4] https://docs.openshift.com/container-platform/4.8/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html#dr-restoring-cluster-state[Restore etcd backup]

+ - [5] https://docs.openshift.com/container-platform/4.8/operators/admin/olm-upgrading-operators.html#olm-upgrading-operators[Upgrading Operators Prior to Cluster Update]

+ 

+ === Prerequisites

+ 

+ - Incase an upgrade fails, it is wise to first take an `etcd` backup. To do so follow the SOP [2].

+ - Ensuare that all installed Operators are at the latest versions for their channel [5].

+ 

+ === Upgrade OCP

+ At the time of writing the version installed on the cluster is `4.8.11` and the `upgrade channel` is set to `stable-4.8`. It is easiest to update the cluster via the web console. Go to:

+ 

+ - Administration

+ - Cluster Settings

+ - In order to upgrade between `z` or `patch` version (x.y.z), when one is available, click the update button.

+ - When moving between `y` or `minor` versions, you must first switch the `upgrade channel` to `fast-4.9` as an example. You should also be on the very latest `z`/`patch` version before upgrading.

+ - When the upgrade has finished, switch back to the `upgrade channel` for stable.

+ 

+ 

+ === Upgrade failures

+ In the worst case scenario we may have to restore etcd from the backups taken at the start [4]. Or reinstall a node entirely.

+ 

+ ==== Troubleshooting

+ There are many possible ways an upgrade can fail mid way through.

+ 

+ - Check the monitoring alerts currently firing, this can often hint towards the problem

+ - Often individual nodes are failing to take the new MachineConfig changes and will show up when examining the `MachineConfigPool` status.

+ - Might require a manual reboot of that particular node

+ - Might require killing pods on that particular node

+ 

@@ -0,0 +1,18 @@ 

+ == SOPs

+ 

+ - xref:sop_configure_baremetal_pxe_uefi_boot.adoc[SOP Configure Baremetal PXE-UEFI Boot]

+ - xref:sop_configure_image_registry_operator.adoc[SOP Configure the Image Registry Operator]

+ - xref:sop_configure_local_storage_operator.adoc[SOP Configure the Local Storage Operator]

+ - xref:sop_configure_oauth_ipa.adoc[SOP Configure oauth Authentication via IPA/Noggin]

+ - xref:sop_configure_openshift_container_storage.adoc[SOP Configure the Openshift Container Storage Operator]

+ - xref:sop_configure_userworkload_monitoring_stack.adoc[SOP Configure the Userworkload Monitoring Stack]

+ - xref:sop_cordoning_nodes_and_draining_pods.adoc[SOP Cordoning and Draining Nodes]

+ - xref:sop_create_machineconfigs.adoc[SOP Create MachineConfigs to Configure RHCOS]

+ - xref:sop_disable_provisioners_role.adoc[SOP Disable the Provisioners Role]

+ - xref:sop_graceful_shutdown_ocp_cluster.adoc[SOP Graceful Cluster Shutdown]

+ - xref:sop_graceful_startup_ocp_cluster.adoc[SOP Graceful Cluster Startup]

+ - xref:sop_installation.adoc[SOP Openshift 4 Installation on Fedora Infra]

+ - xref:sop_retrieve_ocp4_cacert.adoc[SOP Retrieve OCP4 CACERT]

+ - xref:sop_upgrade.adoc[SOP Upgrade OCP4 Cluster]

+ - xref:sop_etcd_backup.adoc[SOP Create etcd backup]

+ - xref:sop_configure_openshift_virtualization_operator.adoc[SOP Configure the Openshift Virtualization Operator]

@@ -41,8 +41,8 @@ 

  ** xref:gdpr_delete.adoc[GDPR Delete - SOP]

  ** xref:gdpr_sar.adoc[GDPR SAR - SOP]

  ** xref:geoip-city-wsgi.adoc[geoip-city-wsgi - SOP]

- ** xref:github2fedmsg.adoc[github2fedmsg - SOP]

  ** xref:github.adoc[Using github for Infra Projects - SOP]

+ ** xref:github2fedmsg.adoc[github2fedmsg - SOP]

  ** xref:greenwave.adoc[Greenwave - SOP]

  ** xref:guestdisk.adoc[Guest Disk Resize - SOP]

  ** xref:guestedit.adoc[Guest Editing - SOP]
@@ -59,9 +59,9 @@ 

  ** xref:jenkins-fedmsg.adoc[Jenkins Fedmsg - SOP]

  ** xref:kerneltest-harness.adoc[Kerneltest-harness - SOP]

  ** xref:kickstarts.adoc[Kickstart Infrastructure - SOP]

- ** xref:koji.adoc[Koji Infrastructure - SOP]

  ** xref:koji-archive.adoc[Koji Archive - SOP]

  ** xref:koji-builder-setup.adoc[Setup Koji Builder - SOP]

+ ** xref:koji.adoc[Koji Infrastructure - SOP]

  ** xref:koschei.adoc[Koschei - SOP]

  ** xref:layered-image-buildsys.adoc[Layered Image Build System - SOP]

  ** xref:mailman.adoc[Mailman Infrastructure - SOP]
@@ -72,14 +72,15 @@ 

  ** xref:memcached.adoc[Memcached Infrastructure - SOP]

  ** xref:message-tagging-service.adoc[Message Tagging Service - SOP]

  ** xref:mirrorhiding.adoc[Mirror Hiding Infrastructure - SOP]

- ** xref:mirrormanager.adoc[MirrorManager Infrastructure - SOP]

  ** xref:mirrormanager-S3-EC2-netblocks.adoc[AWS Mirrors - SOP]

+ ** xref:mirrormanager.adoc[MirrorManager Infrastructure - SOP]

  ** xref:mote.adoc[mote - SOP]

  ** xref:nagios.adoc[Fedora Infrastructure Nagios - SOP]

  ** xref:netapp.adoc[Netapp Infrastructure - SOP]

  ** xref:new-hosts.adoc[DNS Host Addition - SOP]

  ** xref:nonhumanaccounts.adoc[Non-human Accounts Infrastructure - SOP]

  ** xref:nuancier.adoc[Nuancier - SOP]

+ ** xref:ocp4:sops.adoc[Openshift 4 SOPs]

  ** xref:odcs.adoc[On Demand Compose Service - SOP]

  ** xref:openqa.adoc[OpenQA Infrastructure - SOP]

  ** xref:openshift.adoc[OpenShift - SOP]
@@ -110,8 +111,8 @@ 

  ** xref:torrentrelease.adoc[Torrent Releases Infrastructure - SOP]

  ** xref:unbound.adoc[Fedora Infra Unbound Notes - SOP]

  ** xref:virt-image.adoc[Fedora Infrastructure Kpartx Notes - SOP]

- ** xref:virtio.adoc[Virtio Notes - SOP]

  ** xref:virt-notes.adoc[Fedora Infrastructure Libvirt Notes - SOP]

+ ** xref:virtio.adoc[Virtio Notes - SOP]

  ** xref:voting.adoc[Voting Infrastructure - SOP]

  ** xref:waiverdb.adoc[WaiverDB - SOP]

  ** xref:wcidff.adoc[What Can I Do For Fedora - SOP]

Signed-off-by: David Kirwan dkirwan@redhat.com
Signed-off-by: Akashdeep Dhar akashdeep.dhar@gmail.com

1 new commit added

  • SOP OCP4 configure UEFI boot
8 months ago

2 new commits added

  • SOP Retrieve ocp4 cacert
  • SOP Create RHCOS ignition files
8 months ago

4 new commits added

  • SOP Retrieve ocp4 cacert
  • SOP Create RHCOS ignition files
  • SOP OCP4 configure UEFI boot
  • SOP for installation of OCP4
8 months ago

1 new commit added

  • SOP Configure Image Registry Operator
8 months ago

4 new commits added

  • SOP Enable User Workload Monitoring Stack
  • SOP Configure local storage
  • SOP disable self-provisioners role
  • SOP configure oauth
8 months ago

1 new commit added

  • SOP Updated configure monitoring stack
8 months ago

@dkirwan Is this finished or are you planning to add more commits?

Metadata Update from @pbokoc:
- Request assigned

8 months ago

Hi @pbokoc close to finishing, we've a few more SOPs that need to be written, and some updates need to be pushed for one. They can be pushed in a future PR, what ever you think is best

@dkirwan That's fine, let's keep adding to this PR, it's late on Friday anyway and I don't feel like resolving the merge conflict right now :) Just ping me when you're finished.

rebased onto dbd34d7901d620fa4be820194c74124a8883853d

8 months ago

1 new commit added

  • SOP Configuration/Installation of the OCS Operator
8 months ago

@pbokoc this should be ready for review now. I'll send a link to this PR to the Fedora Infra list for feedback also.

11 new commits added

  • SOP Configuration/Installation of the OCS Operator
  • SOP Updated configure monitoring stack
  • SOP Enable User Workload Monitoring Stack
  • SOP Configure local storage
  • SOP disable self-provisioners role
  • SOP configure oauth
  • SOP Configure Image Registry Operator
  • SOP Retrieve ocp4 cacert
  • SOP Create RHCOS ignition files
  • SOP OCP4 configure UEFI boot
  • SOP for installation of OCP4
8 months ago

2 new commits added

  • Metrics-for-apps: Added SOPs
  • Metrics-for-apps: sorted sysadmin SOP list
8 months ago

rebased onto ab243e19fd736f516ded634f5609abe31d5313bb

8 months ago

rebased onto 0d1e87a50e69e0b8e531d0debac7e85aef17d2ec

8 months ago

1 new commit added

  • metrics-for-apps: Added new sops
8 months ago

1 new commit added

  • metrics-for-apps: SOP Configure Openshift Virtualization Operator
8 months ago

rebased onto a77d6bb

8 months ago

Pull-Request has been merged by dkirwan

8 months ago