From dc54bec3d5b3f007c1b2d9704ae49ced52f79250 Mon Sep 17 00:00:00 2001 From: David Kirwan Date: Mar 03 2022 11:44:45 +0000 Subject: ocp4: reordering header levels --- diff --git a/modules/ocp4/pages/sop_add_node.adoc b/modules/ocp4/pages/sop_add_node.adoc index 99bc9f6..fd5794a 100644 --- a/modules/ocp4/pages/sop_add_node.adoc +++ b/modules/ocp4/pages/sop_add_node.adoc @@ -1,16 +1,16 @@ -== SOP Add an OCP4 Node to an Existing Cluster += SOP Add an OCP4 Node to an Existing Cluster This SOP should be used in the following scenario: - Red Hat OpenShift Container Platform 4.x cluster has been installed some time ago (1+ days ago) and additional worker nodes are required to increase the capacity for the cluster. -=== Resources +== Resources - [1] https://access.redhat.com/solutions/4246261[How to add OpenShift 4 RHCOS worker nodes in UPI within the first 24 hours] - [2] https://access.redhat.com/solutions/4799921[How to add OpenShift 4 RHCOS worker nodes to UPI after the first 24 hours] - [3] https://docs.openshift.com/container-platform/4.8/post_installation_configuration/node-tasks.html[Adding RHCOS worker nodes] -=== Steps +== Steps 1. Add the new nodes to the Ansible inventory file in the appropriate group. eg: diff --git a/modules/ocp4/pages/sop_add_odf_storage.adoc b/modules/ocp4/pages/sop_add_odf_storage.adoc index 02d41bd..aef0374 100644 --- a/modules/ocp4/pages/sop_add_odf_storage.adoc +++ b/modules/ocp4/pages/sop_add_odf_storage.adoc @@ -1,4 +1,4 @@ -== SOP Add new capacity to the OCP4 ODF Storage Cluster += SOP Add new capacity to the OCP4 ODF Storage Cluster This SOP should be used in the following scenario: - Red Hat OpenShift Container Platform 4.x cluster has been installed @@ -6,13 +6,13 @@ This SOP should be used in the following scenario: - These additional worker nodes have storage resources which we wish to add to the Openshift Datafoundation Storage Cluster - We are adding enough storage to meet the minimum of 3 replicas. eg: 3 nodes, or enough storage devices that the the number is divisble by 3. -=== Resources +== Resources - [1] https://access.redhat.com/solutions/4246261[How to add OpenShift 4 RHCOS worker nodes in UPI within the first 24 hours] - [2] https://access.redhat.com/solutions/4799921[How to add OpenShift 4 RHCOS worker nodes to UPI after the first 24 hours] - [3] https://docs.openshift.com/container-platform/4.8/post_installation_configuration/node-tasks.html[Adding RHCOS worker nodes] - [4] https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.9[Openshift Data Foundation Product Notes] -=== Steps +== Steps 1. Once a new node has been added to the Openshift cluster, we can manage the extra local storage devices on this node from within Openshift itself, providing that they do not contain file paritions/filesystems. In the case of a node being repurposed, please first ensure that all storage devices except `/dev/sda` are partition and filesystem free before starting. 2. From within the Openshift webconsole, or via cli search for all "LocalVolumeDiscovery" objects. diff --git a/modules/ocp4/pages/sop_configure_baremetal_pxe_uefi_boot.adoc b/modules/ocp4/pages/sop_configure_baremetal_pxe_uefi_boot.adoc index 048aaad..f1fe224 100644 --- a/modules/ocp4/pages/sop_configure_baremetal_pxe_uefi_boot.adoc +++ b/modules/ocp4/pages/sop_configure_baremetal_pxe_uefi_boot.adoc @@ -1,4 +1,4 @@ -== Configure Baremetal PXE-UEFI Boot += Configure Baremetal PXE-UEFI Boot A high level overview of how a baremetal node in the Fedora Infra gets booted via UEFI is as follows. - Server powered on @@ -9,12 +9,12 @@ A high level overview of how a baremetal node in the Fedora Infra gets booted vi - tftpboot serves kernel and initramfs to the server - Server boots with kernel and initramfs, and retrieves ingition file from `os-control01` -=== Resources +== Resources - [1] https://pagure.io/fedora-infra/ansible/blob/main/f/roles/dhcp_server[Ansible Role DHCP Server] - [2] https://pagure.io/fedora-infra/ansible/blob/main/f/roles/tftp_server[Ansible Role tftpboot server] -=== UEFI Configuration +== UEFI Configuration The configuration for UEFI booting is contained in the `grub.cfg` config which is not currently under source control. It is located on the `batcave01` at: `/srv/web/infra/bigfiles/tftpboot2/uefi/grub.cfg`. The following is a sample configuration to install a baremetal OCP4 worker in the Staging cluster. @@ -28,7 +28,7 @@ menuentry 'RHCOS 4.8 worker staging' { Any new changes must be made here. Writing to this file requires one to be a member of the `sysadmin-main` group, so best to instead create a ticket in the Fedora Infra issue tracker with patch request. See the following https://pagure.io/fedora-infrastructure/issue/10213[PR] for inspiration. -=== Pushing new changes out to the tftpboot server +== Pushing new changes out to the tftpboot server To push out changes made to the `grub.cfg` the following playbook should be run, which requires `sysadmin-noc` group permissions: ---- diff --git a/modules/ocp4/pages/sop_configure_image_registry_operator.adoc b/modules/ocp4/pages/sop_configure_image_registry_operator.adoc index 417a715..0dfa925 100644 --- a/modules/ocp4/pages/sop_configure_image_registry_operator.adoc +++ b/modules/ocp4/pages/sop_configure_image_registry_operator.adoc @@ -1,10 +1,10 @@ -== SOP Configure the Image Registry Operator += SOP Configure the Image Registry Operator -=== Resources +== Resources - [1] https://docs.openshift.com/container-platform/4.8/registry/configuring_registry_storage/configuring-registry-storage-baremetal.html#configuring-registry-storage-baremetal[Configuring Registry Storage Baremetal] -=== Enable the image registry operator +== Enable the image registry operator For detailed instructions please refer to the official documentation for the particular version of Openshift [1]. From the `os-control01` node we can enable the Image Registry Operator set it to a `Managed` state like so via the CLI.: diff --git a/modules/ocp4/pages/sop_configure_local_storage_operator.adoc b/modules/ocp4/pages/sop_configure_local_storage_operator.adoc index f467742..6985037 100644 --- a/modules/ocp4/pages/sop_configure_local_storage_operator.adoc +++ b/modules/ocp4/pages/sop_configure_local_storage_operator.adoc @@ -1,11 +1,11 @@ -== Configure the Local Storage Operator += Configure the Local Storage Operator -=== Resources +== Resources - [1] https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.7/html/deploying_openshift_container_storage_using_bare_metal_infrastructure/deploy-using-local-storage-devices-bm - [2] https://github.com/centosci/ocp4-docs/blob/master/sops/localstorage/installation.md -=== Installation +== Installation For installation instructions visit the official docs at: [1]. The CentOS CI SOP at [2] also has more context but it is now slightly dated. - From the webconsole, click on the `Operators` option, then `OperatorHub` @@ -17,7 +17,7 @@ For installation instructions visit the official docs at: [1]. The CentOS CI SOP - Update approval set to automatic - Click install -=== Configuration +== Configuration A prerequisite to this step is to have all volumes on the nodes already formatted and available prior to this step. This can be done via a machineconfig/ignition file during installation time, or alternatively SSH onto the boxes and manually create / format the volumes. - Create a `LocalVolumeDiscovery` and configured it to target the disks on all nodes diff --git a/modules/ocp4/pages/sop_configure_oauth_ipa.adoc b/modules/ocp4/pages/sop_configure_oauth_ipa.adoc index 12a989e..ef5c3a0 100644 --- a/modules/ocp4/pages/sop_configure_oauth_ipa.adoc +++ b/modules/ocp4/pages/sop_configure_oauth_ipa.adoc @@ -1,12 +1,12 @@ -== SOP Configure oauth Authentication via IPA/Noggin += SOP Configure oauth Authentication via IPA/Noggin -=== Resources +== Resources - [1] https://pagure.io/fedora-infra/ansible/blob/main/f/files/communishift/objects[Example Config from Communishift] -=== OIDC Setup +== OIDC Setup The first step is to request that a secret be created for this environment, please open a ticket with Fedora Infra. Once the secret has been made available we can add it to an Openshift Secret in the cluster like so: ---- diff --git a/modules/ocp4/pages/sop_configure_openshift_container_storage.adoc b/modules/ocp4/pages/sop_configure_openshift_container_storage.adoc index 2ad137b..02b2007 100644 --- a/modules/ocp4/pages/sop_configure_openshift_container_storage.adoc +++ b/modules/ocp4/pages/sop_configure_openshift_container_storage.adoc @@ -1,12 +1,12 @@ -== Configure the Openshift Container Storage Operator += Configure the Openshift Container Storage Operator -=== Resources +== Resources - [1] https://docs.openshift.com/container-platform/4.8/storage/persistent_storage/persistent-storage-ocs.html[Official Docs] - [2] https://github.com/red-hat-storage/ocs-operator[Github] -=== Installation +== Installation Important: before following this SOP, please ensure that you have already followed the SOP to install the Local Storage Operator first, as this is a requirement for the OCS operator. For full detailed instructions please refer to the official docs at: [1]. For general instructions see below: @@ -22,7 +22,7 @@ For full detailed instructions please refer to the official docs at: [1]. For ge - Click install -=== Configuration +== Configuration When the operator is finished installing, we can continue, please ensure that a minimum of 3 nodes are available. - A `StorageCluster` is required to complete this installation, click the Create StorageCluster. diff --git a/modules/ocp4/pages/sop_configure_openshift_virtualization_operator.adoc b/modules/ocp4/pages/sop_configure_openshift_virtualization_operator.adoc index 97d058f..0936da5 100644 --- a/modules/ocp4/pages/sop_configure_openshift_virtualization_operator.adoc +++ b/modules/ocp4/pages/sop_configure_openshift_virtualization_operator.adoc @@ -1,11 +1,11 @@ -== Installation of the Openshift Virtualisation Operator += Installation of the Openshift Virtualisation Operator -=== Resources +== Resources - [1] https://alt.fedoraproject.org/cloud/[Fedora Images] - [2] https://github.com/kubevirt/kubevirt/blob/main/docs/container-register-disks.md[Kubevirt Importing Containers of VMI Images] -=== Installation +== Installation From the web console, choose the `Operators` menu, and choose `OperatorHub`. Search for `Openshift Virtualization` @@ -17,7 +17,7 @@ When the installation of the Operator is completed, create a `HyperConverged` ob Next create a `HostPathProvisioner` object the default options should be fine, click next through the menus. -=== Verification +== Verification To verify that the installation of the Operator is successful, we can attempt to create a VM. From the [1] location download the Fedora34 `Cloud Base image for Openstack` image with the `qcow2` format locally. diff --git a/modules/ocp4/pages/sop_configure_userworkload_monitoring_stack.adoc b/modules/ocp4/pages/sop_configure_userworkload_monitoring_stack.adoc index ac4a1cd..02657f5 100644 --- a/modules/ocp4/pages/sop_configure_userworkload_monitoring_stack.adoc +++ b/modules/ocp4/pages/sop_configure_userworkload_monitoring_stack.adoc @@ -1,12 +1,12 @@ -== Enable User Workload Monitoring Stack += Enable User Workload Monitoring Stack -=== Resources +== Resources - [1] https://docs.openshift.com/container-platform/4.8/monitoring/enabling-monitoring-for-user-defined-projects.html[Official Docs] - [2] https://docs.openshift.com/container-platform/4.8/monitoring/enabling-monitoring-for-user-defined-projects.html#granting-users-permission-to-monitor-user-defined-projects_enabling-monitoring-for-user-defined-projects[Providing Access to the UWMS features] - [3] https://docs.openshift.com/container-platform/4.8/monitoring/enabling-monitoring-for-user-defined-projects.html#granting-user-permissions-using-the-web-console_enabling-monitoring-for-user-defined-projects[Providing Access to the UWMS dashboard] - [4] https://docs.openshift.com/container-platform/4.8/monitoring/configuring-the-monitoring-stack.html#configuring-persistent-storage[Configure Monitoring Stack] -=== Configuration +== Configuration To enable the stack edit the `cluster-monitoring` ConfigMap like so: ---- diff --git a/modules/ocp4/pages/sop_cordoning_nodes_and_draining_pods.adoc b/modules/ocp4/pages/sop_cordoning_nodes_and_draining_pods.adoc index 004657a..c178c92 100644 --- a/modules/ocp4/pages/sop_cordoning_nodes_and_draining_pods.adoc +++ b/modules/ocp4/pages/sop_cordoning_nodes_and_draining_pods.adoc @@ -1,10 +1,10 @@ -== Cordoning Nodes and Draining Pods += Cordoning Nodes and Draining Pods This SOP should be followed in the following scenarios: - If maintenance is scheduled to be carried out on an Openshift node. -=== Steps +== Steps 1. Connect to the `os-control01` host associated with this ENV. Become root `sudo su -`. @@ -51,6 +51,6 @@ for node in ${nodes[@]}; do oc adm uncordon $node; done ---- -=== Resources +== Resources - [1] [Nodes - working with nodes](https://docs.openshift.com/container-platform/4.8/nodes/nodes/nodes-nodes-working.html) diff --git a/modules/ocp4/pages/sop_create_machineconfigs.adoc b/modules/ocp4/pages/sop_create_machineconfigs.adoc index 016b68e..bc2b666 100644 --- a/modules/ocp4/pages/sop_create_machineconfigs.adoc +++ b/modules/ocp4/pages/sop_create_machineconfigs.adoc @@ -1,11 +1,11 @@ -== Create MachineConfigs to Configure RHCOS += Create MachineConfigs to Configure RHCOS -=== Resources +== Resources - [1] https://coreos.github.io/butane/getting-started/[Butane Getting Started] - [2] https://docs.openshift.com/container-platform/4.8/post_installation_configuration/machine-configuration-tasks.html#installation-special-config-chrony_post-install-machine-configuration-tasks[OCP4 Post Installation Configuration] -=== Butane +== Butane "Butane (formerly the Fedora CoreOS Config Transpiler) is a tool that consumes a Butane Config and produces an Ignition Config, which is a JSON document that can be given to a Fedora CoreOS machine when it first boots." [1] Butane is available in a container image, we can pull the latest version locally like so: diff --git a/modules/ocp4/pages/sop_disable_provisioners_role.adoc b/modules/ocp4/pages/sop_disable_provisioners_role.adoc index 0808696..362b074 100644 --- a/modules/ocp4/pages/sop_disable_provisioners_role.adoc +++ b/modules/ocp4/pages/sop_disable_provisioners_role.adoc @@ -1,11 +1,11 @@ -== SOP Disable `self-provisioners` Role += SOP Disable `self-provisioners` Role -=== Resources +== Resources - [1] https://docs.openshift.com/container-platform/4.4/applications/projects/configuring-project-creation.html#disabling-project-self-provisioning_configuring-project-creation -=== Disabling self-provisioners role +== Disabling self-provisioners role By default, when a user authenticates with Openshift via Oauth, it is part of the `self-provisioners` group. This group provides the ability to create new projects. On CentOS CI we do not want users to be able to create their own projects, as we have a system in place where we create a project and control the administrators of that project. To disable the self-provisioner role do the following as outlined in the documentation[1]. diff --git a/modules/ocp4/pages/sop_etcd_backup.adoc b/modules/ocp4/pages/sop_etcd_backup.adoc index fbfc07c..5fccf52 100644 --- a/modules/ocp4/pages/sop_etcd_backup.adoc +++ b/modules/ocp4/pages/sop_etcd_backup.adoc @@ -1,14 +1,14 @@ -== Create etcd backup += Create etcd backup This SOP should be followed in the following scenarios: - When the need exists to create an etcd backup. - When shutting a cluster down gracefully. -=== Resources +== Resources - [1] https://docs.openshift.com/container-platform/4.8/backup_and_restore/backing-up-etcd.html[Creating an etcd backup] -=== Take etcd backup +== Take etcd backup 1. Connect to the `os-control01` node associated with the ENV. diff --git a/modules/ocp4/pages/sop_graceful_shutdown_ocp_cluster.adoc b/modules/ocp4/pages/sop_graceful_shutdown_ocp_cluster.adoc index 7de41f5..614838a 100644 --- a/modules/ocp4/pages/sop_graceful_shutdown_ocp_cluster.adoc +++ b/modules/ocp4/pages/sop_graceful_shutdown_ocp_cluster.adoc @@ -1,9 +1,9 @@ -== Graceful Shutdown of an Openshift 4 Cluster += Graceful Shutdown of an Openshift 4 Cluster This SOP should be followed in the following scenarios: - Graceful full shut down of the Openshift 4 cluster is required. -=== Steps +== Steps Prequisite steps: - Follow the SOP for cordoning and draining the nodes. @@ -25,6 +25,6 @@ for node in ${nodes[@]}; do ssh -i /root/ocp4/ocp-/ssh/id_rsa core@$node su ---- -==== Resources +=== Resources - [1] https://docs.openshift.com/container-platform/4.5/backup_and_restore/graceful-cluster-shutdown.html[Graceful Cluster Shutdown] diff --git a/modules/ocp4/pages/sop_graceful_startup_ocp_cluster.adoc b/modules/ocp4/pages/sop_graceful_startup_ocp_cluster.adoc index 4fe76ae..a66bcc5 100644 --- a/modules/ocp4/pages/sop_graceful_startup_ocp_cluster.adoc +++ b/modules/ocp4/pages/sop_graceful_startup_ocp_cluster.adoc @@ -1,13 +1,13 @@ -== Graceful Startup of an Openshift 4 Cluster += Graceful Startup of an Openshift 4 Cluster This SOP should be followed in the following scenarios: - Graceful start up of an Openshift 4 cluster. -=== Steps +== Steps Prequisite steps: -==== Start the VM Control Plane instances +=== Start the VM Control Plane instances Ensure that the control plane instances start first. ---- @@ -15,7 +15,7 @@ Ensure that the control plane instances start first. ---- -==== Start the physical nodes +=== Start the physical nodes To connect to `idrac`, you must be connected to the Red Hat VPN. Next find the management IP associated with each node. On the `batcave01` instance, in the dns configuration, the following bare metal machines make up the production/staging OCP4 worker nodes. @@ -31,7 +31,7 @@ oshift-dell06 IN A 10.3.160.185 # worker03 staging Login to the `idrac` interface that corresponds with each worker, one at a time. Ensure the node is booting via harddrive, then power it on. -==== Once the nodes have been started they must be uncordoned if appropriate +=== Once the nodes have been started they must be uncordoned if appropriate ---- oc get nodes @@ -82,7 +82,7 @@ kempty-n9.ci.centos.org Ready worker 106d v1.18.3+6c42de8 ---- -=== Resources +== Resources - [1] https://docs.openshift.com/container-platform/4.5/backup_and_restore/graceful-cluster-restart.html[Graceful Cluster Startup] - [2] https://docs.openshift.com/container-platform/4.5/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html#dr-restoring-cluster-state[Cluster disaster recovery] diff --git a/modules/ocp4/pages/sop_installation.adoc b/modules/ocp4/pages/sop_installation.adoc index 6e74301..9af9c2e 100644 --- a/modules/ocp4/pages/sop_installation.adoc +++ b/modules/ocp4/pages/sop_installation.adoc @@ -1,17 +1,17 @@ -== SOP Installation/Configuration of OCP4 on Fedora Infra += SOP Installation/Configuration of OCP4 on Fedora Infra -=== Resources +== Resources - [1]: https://docs.openshift.com/container-platform/4.8/installing/installing_bare_metal/[Official OCP4 Installation Documentation] -=== Install +== Install To install OCP4 on Fedora Infra, one must be apart of the following groups: - `sysadmin-openshift` - `sysadmin-noc` -==== Prerequisites +=== Prerequisites Visit the https://console.redhat.com/openshift/install/metal/user-provisioned[OpenShift Console] and download the following OpenShift tools: * A Red Hat Access account is required @@ -22,7 +22,7 @@ Visit the https://console.redhat.com/openshift/install/metal/user-provisioned[Op * Take a copy of your pull secret file you will need to put this in the `install-config.yaml` file in the next step. -==== Generate install-config.yaml file +=== Generate install-config.yaml file We must create a `install-config.yaml` file, use the following example for inspiration, alternatively refer to the documentation[1] for more detailed information/explainations. ---- @@ -62,10 +62,10 @@ sshKey: 'PUT SSH PUBLIC KEY HERE kubeadmin@core' * Take a backup of the `install-config.yaml` to `install-config.yaml.bak`, as running the next steps consumes this file, having a backup allows you to recover from mistakes quickly. -==== Create the Installation Files +=== Create the Installation Files Using the `openshift-install` tool we can generate the installation files. Make sure that the `install-config.yaml` file is in the `/path/to/ocp4-` location before attempting the next steps. -===== Create the Manifest Files +==== Create the Manifest Files The manifest files are human readable, at this stage you can put any customisations required before the installation begins. * Create the manifests: `openshift-install create manifests --dir=/path/to/ocp4-` @@ -73,7 +73,7 @@ The manifest files are human readable, at this stage you can put any customisati * The following step should be performed at this point, edit the `/path/to/ocp4-/manifests/cluster-scheduler-02-config.yml` change the `mastersSchedulable` value to `false`. -===== Create the Ignition Files +==== Create the Ignition Files The ignition files have been generated from the manifests and MachineConfig files to generate the final installation files for the three roles: `bootstrap`, `master`, `worker`. In Fedora we prefer not to use the term `master` here, we have renamed this role to `controlplane`. * Create the ignition files: `openshift-install create ignition-configs --dir=/path/to/ocp4-` @@ -82,7 +82,7 @@ The ignition files have been generated from the manifests and MachineConfig file * A directory has been created, `auth`. This contains two files: `kubeadmin-password` and `kubeconfig`. These allow `cluster-admin` access to the cluster. -==== Copy the Ignition files to the `batcave01` server +=== Copy the Ignition files to the `batcave01` server On the `batcave01` at the following location: `/srv/web/infra/bigfiles/openshiftboot/`: * Create a directory to match the environment: `mkdir /srv/web/infra/bigfiles/openshiftboot/ocp4-` @@ -102,19 +102,19 @@ On the `batcave01` at the following location: `/srv/web/infra/bigfiles/openshift ---- -==== Update the ansible inventory +=== Update the ansible inventory The ansible inventory/hostvars/group vars should be updated with the new hosts information. For inspiration see the following https://pagure.io/fedora-infra/ansible/pull-request/765[PR] where we added the ocp4 production changes. -==== Update the DNS/DHCP configuration +=== Update the DNS/DHCP configuration The DNS and DHCP configuration must also be updated. This https://pagure.io/fedora-infra/ansible/pull-request/765[PR] contains the necessiary changes DHCP for prod and can be done in ansible. However the DNS changes may only be performed by `sysadmin-main`. For this reason any DNS changes must go via a patch snippet which is emailed to the `infrastructure@lists.fedoraproject.org` mailing list for review and approval. This process may take several days. -==== Generate the TLS Certs for the new environment +=== Generate the TLS Certs for the new environment This is beyond the scope of this SOP, the best option is to create a ticket for Fedora Infra to request that these certs are created and available for use. The following certs should be available: - `*.apps..fedoraproject.org` @@ -122,14 +122,14 @@ This is beyond the scope of this SOP, the best option is to create a ticket for - `api-int..fedoraproject.org` -==== Run the Playbooks +=== Run the Playbooks There are a number of playbooks required to be run. Once all the previous steps have been reached, we can run these playbooks from the `batcave01` instance. - `sudo rbac-playbook groups/noc.yml -t 'tftp_server,dhcp_server'` - `sudo rbac-playbook groups/proxies.yml -t 'haproxy,httpd,iptables'` -===== Baremetal / VMs +==== Baremetal / VMs Depending on if some of the nodes are VMs or baremetal, different tags should be supplied to the following playbook. If the entire cluster is baremetal you can skip the `kvm_deploy` tag entirely. If there are VMs used for some of the roles, make sure to leave it in. @@ -137,7 +137,7 @@ If there are VMs used for some of the roles, make sure to leave it in. - `sudo rbac-playbook manual/ocp4-place-ignitionfiles.yml -t "ignition,repo,kvm_deploy"` -===== Baremetal +==== Baremetal At this point we can switch on the baremetal nodes and begin the PXE/UEFI boot process. The baremetal nodes should via DHCP/DNS have the configuration necessary to reach out to the `noc01.iad2.fedoraproject.org` server and retrieve the UEFI boot configuration via PXE. Once booted up, you should visit the management console for this node, and manually choose the UEFI configuration appropriate for its role. @@ -149,7 +149,7 @@ The system will then become autonomous, it will install and potentially reboot m Eventually you will be presented with a SSH login prompt, where it should have the correct hostname eg: `ocp01` to match what is in the DNS configuration. -==== Bootstrapping completed +=== Bootstrapping completed When the control plane is up, we should see all controlplane instances available in the appropriate haproxy dashboard. eg: https://admin.fedoraproject.org/haproxy/proxy01=ocp-masters-backend-kapi[haproxy]. At this time we should take the `bootstrap` instance out of the haproxy load balancer. @@ -158,16 +158,16 @@ At this time we should take the `bootstrap` instance out of the haproxy load bal - Once merged, run the following playbook once more: `sudo rbac-playbook groups/proxies.yml -t 'haproxy'` -==== Begin instllation of the worker nodes +=== Begin instllation of the worker nodes Follow the same processes listed in the Baremetal section above to switch on the worker nodes and begin installation. -==== Configure the `os-control01` to authenticate with the new OCP4 cluster +=== Configure the `os-control01` to authenticate with the new OCP4 cluster Copy the `kubeconfig` to `~root/.kube/config` on the `os-control01` instance. This will allow the `root` user to automatically be authenticated to the new OCP4 cluster with `cluster-admin` privileges. -==== Accept Node CSR Certs +=== Accept Node CSR Certs To accept the worker/compute nodes into the cluster we need to accept their CSR certs. List the CSR certs. The ones we're interested in will show as pending: @@ -200,7 +200,7 @@ worker05.ocp.stg.iad2.fedoraproject.org Ready worker 34d v1.21.1+980738 At this point the cluster is basically up and running. -=== Follow on SOPs +== Follow on SOPs Several other SOPs should be followed to perform the post installation configuration on the cluster. - xref:sop_configure_baremetal_pxe_uefi_boot.adoc[SOP Configure Baremetal PXE-UEFI Boot] diff --git a/modules/ocp4/pages/sop_retrieve_ocp4_cacert.adoc b/modules/ocp4/pages/sop_retrieve_ocp4_cacert.adoc index 01b9136..e4c8a36 100644 --- a/modules/ocp4/pages/sop_retrieve_ocp4_cacert.adoc +++ b/modules/ocp4/pages/sop_retrieve_ocp4_cacert.adoc @@ -1,10 +1,10 @@ -== SOP Retrieve OCP4 Cluster CACERT += SOP Retrieve OCP4 Cluster CACERT -=== Resources +== Resources - [1] https://pagure.io/fedora-infra/ansible/blob/main/f/roles/dhcp_server[Ansible Role DHCP Server] -=== Retrieve CACERT +== Retrieve CACERT In Fedora Infra, we have Apache terminating TLS for the cluster. Connections to the api and the machineconfig server are handled by haproxy. To prevent TLS errors we must configure haproxy with the OCP4 Cluster CA Cert. This can be retrieved once the cluster control plane has been installed, from the `os-control01` node like so: diff --git a/modules/ocp4/pages/sop_upgrade.adoc b/modules/ocp4/pages/sop_upgrade.adoc index 58b3d18..42556c9 100644 --- a/modules/ocp4/pages/sop_upgrade.adoc +++ b/modules/ocp4/pages/sop_upgrade.adoc @@ -1,7 +1,7 @@ -== Upgrade OCP4 Cluster += Upgrade OCP4 Cluster Please see the official documentation for more information [1][3], this SOP can be used as a rough guide. -=== Resources +== Resources - [1] https://docs.openshift.com/container-platform/4.8/updating/updating-cluster-between-minor.html[Upgrading OCP4 Cluster Between Minor Versions] - [2] xref:sop_etcd_backup.adoc[SOP Create etcd backup] @@ -10,14 +10,14 @@ Please see the official documentation for more information [1][3], this SOP can - [5] https://docs.openshift.com/container-platform/4.8/operators/admin/olm-upgrading-operators.html#olm-upgrading-operators[Upgrading Operators Prior to Cluster Update] - [6] https://access.redhat.com/downloads/content/290/ver=4.8/rhel---8/4.8.18/x86_64/packages[Openshift Clients RPM Download] -=== Prerequisites +== Prerequisites - Incase an upgrade fails, it is wise to first take an `etcd` backup. To do so follow the SOP [2]. - Ensure that all installed Operators are at the latest versions for their channel [5]. - Ensure that the latest `oc` client rpm is available at `/srv/web/infra/bigfiles/openshiftboot/oc-client/` on the `batcave01` server. Retrieve the RPM from [6] choose the `Openshift Clients Binary` rpm. Rename rpm to `oc-client.rpm` - Ensure that the `sudo rbac-playbook manual/ocp4-sysadmin-openshift.yml -t "upgrade-rpm"` playbook is run to install this updated oc client rpm. -=== Upgrade OCP +== Upgrade OCP At the time of writing the version installed on the cluster is `4.8.11` and the `upgrade channel` is set to `stable-4.8`. It is easiest to update the cluster via the web console. Go to: - Administration @@ -27,10 +27,10 @@ At the time of writing the version installed on the cluster is `4.8.11` and the - When the upgrade has finished, switch back to the `upgrade channel` for stable. -=== Upgrade failures +== Upgrade failures In the worst case scenario we may have to restore etcd from the backups taken at the start [4]. Or reinstall a node entirely. -==== Troubleshooting +=== Troubleshooting There are many possible ways an upgrade can fail mid way through. - Check the monitoring alerts currently firing, this can often hint towards the problem diff --git a/modules/ocp4/pages/sops.adoc b/modules/ocp4/pages/sops.adoc index 852d42d..fcaf445 100644 --- a/modules/ocp4/pages/sops.adoc +++ b/modules/ocp4/pages/sops.adoc @@ -1,4 +1,4 @@ -== SOPs += SOPs - xref:sop_configure_baremetal_pxe_uefi_boot.adoc[SOP Configure Baremetal PXE-UEFI Boot] - xref:sop_configure_image_registry_operator.adoc[SOP Configure the Image Registry Operator]