| |
@@ -0,0 +1,215 @@
|
| |
+ == SOP Installation/Configuration of OCP4 on Fedora Infra
|
| |
+
|
| |
+ === Resources
|
| |
+
|
| |
+ - [1]: https://docs.openshift.com/container-platform/4.8/installing/installing_bare_metal/[Official OCP4 Installation Documentation]
|
| |
+
|
| |
+ === Install
|
| |
+ To install OCP4 on Fedora Infra, one must be apart of the following groups:
|
| |
+
|
| |
+ - `sysadmin-openshift`
|
| |
+ - `sysadmin-noc`
|
| |
+
|
| |
+
|
| |
+ ==== Prerequisites
|
| |
+ Visit the https://console.redhat.com/openshift/install/metal/user-provisioned[OpenShift Console] and download the following OpenShift tools:
|
| |
+
|
| |
+ * A Red Hat Access account is required
|
| |
+ * OC client tools https://access.redhat.com/downloads/content/290/ver=4.8/rhel---8/4.8.10/x86_64/product-software[Here]
|
| |
+ * OC installation tool https://access.redhat.com/downloads/content/290/ver=4.8/rhel---8/4.8.10/x86_64/product-software[Here]
|
| |
+ * Ensure the downloaded tools are available on the `PATH`
|
| |
+ * A valid OCP4 subscription is required to complete the installation configuration, by default you have a 60 day trial.
|
| |
+ * Take a copy of your pull secret file you will need to put this in the `install-config.yaml` file in the next step.
|
| |
+
|
| |
+
|
| |
+ ==== Generate install-config.yaml file
|
| |
+ We must create a `install-config.yaml` file, use the following example for inspiration, alternatively refer to the documentation[1] for more detailed information/explainations.
|
| |
+
|
| |
+ ----
|
| |
+ apiVersion: v1
|
| |
+ baseDomain: stg.fedoraproject.org
|
| |
+ compute:
|
| |
+ - hyperthreading: Enabled
|
| |
+ name: worker
|
| |
+ replicas: 0
|
| |
+ controlPlane:
|
| |
+ hyperthreading: Enabled
|
| |
+ name: master
|
| |
+ replicas: 3
|
| |
+ metadata:
|
| |
+ name: 'ocp'
|
| |
+ networking:
|
| |
+ clusterNetwork:
|
| |
+ - cidr: 10.128.0.0/14
|
| |
+ hostPrefix: 23
|
| |
+ networkType: OpenShiftSDN
|
| |
+ serviceNetwork:
|
| |
+ - 172.30.0.0/16
|
| |
+ platform:
|
| |
+ none: {}
|
| |
+ fips: false
|
| |
+ pullSecret: 'PUT PULL SECRET HERE'
|
| |
+ sshKey: 'PUT SSH PUBLIC KEY HERE kubeadmin@core'
|
| |
+ ----
|
| |
+
|
| |
+ * Login to the `os-control01` corresponding with the environment
|
| |
+ * Make a directory to hold the installation files: `mkdir ocp4-<ENV>`
|
| |
+ * Enter this newly created directory: `cd ocp4-<ENV>`
|
| |
+ * Generate a fresh SSH keypair: `ssh-keygen -f ./ocp4-<ENV>-ssh`
|
| |
+ * Create a `ssh` directory and place this keypair into it.
|
| |
+ * Put the contents of the public key in the `sshKey` value in the `install-config.yaml` file
|
| |
+ * Put the contents of your Pull Secret in the `pullSecret` value in the `install-config.yaml`
|
| |
+ * Take a backup of the `install-config.yaml` to `install-config.yaml.bak`, as running the next steps consumes this file, having a backup allows you to recover from mistakes quickly.
|
| |
+
|
| |
+
|
| |
+ ==== Create the Installation Files
|
| |
+ Using the `openshift-install` tool we can generate the installation files. Make sure that the `install-config.yaml` file is in the `/path/to/ocp4-<ENV>` location before attempting the next steps.
|
| |
+
|
| |
+ ===== Create the Manifest Files
|
| |
+ The manifest files are human readable, at this stage you can put any customisations required before the installation begins.
|
| |
+
|
| |
+ * Create the manifests: `openshift-install create manifests --dir=/path/to/ocp4-<ENV>`
|
| |
+ * All configuration for RHCOS must be done via MachineConfigs configuration. If there is known configuration which must be performed, such as NTP, you can copy the MachineConfigs into the `/path/to/ocp4-<ENV>/openshift` directory now.
|
| |
+ * The following step should be performed at this point, edit the `/path/to/ocp4-<ENV>/manifests/cluster-scheduler-02-config.yml` change the `mastersSchedulable` value to `false`.
|
| |
+
|
| |
+
|
| |
+ ===== Create the Ignition Files
|
| |
+ The ignition files have been generated from the manifests and MachineConfig files to generate the final installation files for the three roles: `bootstrap`, `master`, `worker`. In Fedora we prefer not to use the term `master` here, we have renamed this role to `controlplane`.
|
| |
+
|
| |
+ * Create the ignition files: `openshift-install create ignition-configs --dir=/path/to/ocp4-<ENV>`
|
| |
+ * At this point you should have the following three files: `bootstrap.ign`, `master.ign` and `worker.ign`.
|
| |
+ * Rename the `master.ign` to `controlplane.ign`.
|
| |
+ * A directory has been created, `auth`. This contains two files: `kubeadmin-password` and `kubeconfig`. These allow `cluster-admin` access to the cluster.
|
| |
+
|
| |
+
|
| |
+ ==== Copy the Ignition files to the `batcave01` server
|
| |
+ On the `batcave01` at the following location: `/srv/web/infra/bigfiles/openshiftboot/`:
|
| |
+
|
| |
+ * Create a directory to match the environment: `mkdir /srv/web/infra/bigfiles/openshiftboot/ocp4-<ENV>`
|
| |
+ * Copy the ignition files, the ssh files and the auth files generated in previous steps, to this newly created directory. Users with `sysadmin-openshift` should have the necessary permissions to write to this location.
|
| |
+ * when this is complete it should look like the following:
|
| |
+ ----
|
| |
+ ├── <ENV>
|
| |
+ │ ├── auth
|
| |
+ │ │ ├── kubeadmin-password
|
| |
+ │ │ └── kubeconfig
|
| |
+ │ ├── bootstrap.ign
|
| |
+ │ ├── controlplane.ign
|
| |
+ │ ├── ssh
|
| |
+ │ │ ├── id_rsa
|
| |
+ │ │ └── id_rsa.pub
|
| |
+ │ └── worker.ign
|
| |
+ ----
|
| |
+
|
| |
+
|
| |
+ ==== Update the ansible inventory
|
| |
+ The ansible inventory/hostvars/group vars should be updated with the new hosts information.
|
| |
+
|
| |
+ For inspiration see the following https://pagure.io/fedora-infra/ansible/pull-request/765[PR] where we added the ocp4 production changes.
|
| |
+
|
| |
+
|
| |
+ ==== Update the DNS/DHCP configuration
|
| |
+ The DNS and DHCP configuration must also be updated. This https://pagure.io/fedora-infra/ansible/pull-request/765[PR] contains the necessiary changes DHCP for prod and can be done in ansible.
|
| |
+
|
| |
+ However the DNS changes may only be performed by `sysadmin-main`. For this reason any DNS changes must go via a patch snippet which is emailed to the `infrastructure@lists.fedoraproject.org` mailing list for review and approval. This process may take several days.
|
| |
+
|
| |
+
|
| |
+ ==== Generate the TLS Certs for the new environment
|
| |
+ This is beyond the scope of this SOP, the best option is to create a ticket for Fedora Infra to request that these certs are created and available for use. The following certs should be available:
|
| |
+
|
| |
+ - `*.apps.<ENV>.fedoraproject.org`
|
| |
+ - `api.<ENV>.fedoraproject.org`
|
| |
+ - `api-int.<ENV>.fedoraproject.org`
|
| |
+
|
| |
+
|
| |
+ ==== Run the Playbooks
|
| |
+ There are a number of playbooks required to be run. Once all the previous steps have been reached, we can run these playbooks from the `batcave01` instance.
|
| |
+
|
| |
+ - `sudo rbac-playbook groups/noc.yml -t 'tftp_server,dhcp_server'`
|
| |
+ - `sudo rbac-playbook groups/proxies.yml -t 'haproxy,httpd,iptables'`
|
| |
+
|
| |
+
|
| |
+ ===== Baremetal / VMs
|
| |
+ Depending on if some of the nodes are VMs or baremetal, different tags should be supplied to the following playbook. If the entire cluster is baremetal you can skip the `kvm_deploy` tag entirely.
|
| |
+
|
| |
+ If there are VMs used for some of the roles, make sure to leave it in.
|
| |
+
|
| |
+ - `sudo rbac-playbook manual/ocp4-place-ignitionfiles.yml -t "ignition,repo,kvm_deploy"`
|
| |
+
|
| |
+
|
| |
+ ===== Baremetal
|
| |
+ At this point we can switch on the baremetal nodes and begin the PXE/UEFI boot process. The baremetal nodes should via DHCP/DNS have the configuration necessary to reach out to the `noc01.iad2.fedoraproject.org` server and retrieve the UEFI boot configuration via PXE.
|
| |
+
|
| |
+ Once booted up, you should visit the management console for this node, and manually choose the UEFI configuration appropriate for its role.
|
| |
+
|
| |
+ The node will begin booting, and during the boot process it will reach out to the `os-control01` instance specific to the `<ENV>` to retrieve the ignition file appropriate to its role.
|
| |
+
|
| |
+ The system will then become autonomous, it will install and potentially reboot multiple times as updates are retrieved/applied etc.
|
| |
+
|
| |
+ Eventually you will be presented with a SSH login prompt, where it should have the correct hostname eg: `ocp01` to match what is in the DNS configuration.
|
| |
+
|
| |
+
|
| |
+ ==== Bootstrapping completed
|
| |
+ When the control plane is up, we should see all controlplane instances available in the appropriate haproxy dashboard. eg: https://admin.fedoraproject.org/haproxy/proxy01=ocp-masters-backend-kapi[haproxy].
|
| |
+
|
| |
+ At this time we should take the `bootstrap` instance out of the haproxy load balancer.
|
| |
+
|
| |
+ - Make the necessiary changes to ansible at: `ansible/roles/haproxy/templates/haproxy.cfg`
|
| |
+ - Once merged, run the following playbook once more: `sudo rbac-playbook groups/proxies.yml -t 'haproxy'`
|
| |
+
|
| |
+
|
| |
+ ==== Begin instllation of the worker nodes
|
| |
+ Follow the same processes listed in the Baremetal section above to switch on the worker nodes and begin installation.
|
| |
+
|
| |
+
|
| |
+ ==== Configure the `os-control01` to authenticate with the new OCP4 cluster
|
| |
+ Copy the `kubeconfig` to `~root/.kube/config` on the `os-control01` instance.
|
| |
+ This will allow the `root` user to automatically be authenticated to the new OCP4 cluster with `cluster-admin` privileges.
|
| |
+
|
| |
+
|
| |
+ ==== Accept Node CSR Certs
|
| |
+ To accept the worker/compute nodes into the cluster we need to accept their CSR certs.
|
| |
+
|
| |
+ List the CSR certs. The ones we're interested in will show as pending:
|
| |
+
|
| |
+ ----
|
| |
+ oc get csr
|
| |
+ ----
|
| |
+
|
| |
+ To accept all the OCP4 node CSRs in a one liner do the following:
|
| |
+
|
| |
+ ----
|
| |
+ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
|
| |
+ ----
|
| |
+
|
| |
+ This should look something like this once completed:
|
| |
+
|
| |
+ ----
|
| |
+ [root@os-control01 ocp4][STG]= oc get nodes
|
| |
+ NAME STATUS ROLES AGE VERSION
|
| |
+ ocp01.ocp.stg.iad2.fedoraproject.org Ready master 34d v1.21.1+9807387
|
| |
+ ocp02.ocp.stg.iad2.fedoraproject.org Ready master 34d v1.21.1+9807387
|
| |
+ ocp03.ocp.stg.iad2.fedoraproject.org Ready master 34d v1.21.1+9807387
|
| |
+ worker01.ocp.stg.iad2.fedoraproject.org Ready worker 21d v1.21.1+9807387
|
| |
+ worker02.ocp.stg.iad2.fedoraproject.org Ready worker 20d v1.21.1+9807387
|
| |
+ worker03.ocp.stg.iad2.fedoraproject.org Ready worker 20d v1.21.1+9807387
|
| |
+ worker04.ocp.stg.iad2.fedoraproject.org Ready worker 34d v1.21.1+9807387
|
| |
+ worker05.ocp.stg.iad2.fedoraproject.org Ready worker 34d v1.21.1+9807387
|
| |
+ ----
|
| |
+
|
| |
+ At this point the cluster is basically up and running.
|
| |
+
|
| |
+
|
| |
+ === Follow on SOPs
|
| |
+ Several other SOPs should be followed to perform the post installation configuration on the cluster.
|
| |
+
|
| |
+ - xref:sop_configure_baremetal_pxe_uefi_boot.adoc[SOP Configure Baremetal PXE-UEFI Boot]
|
| |
+ - xref:sop_create_machineconfigs.adoc[SOP Create MachineConfigs to Configure RHCOS]
|
| |
+ - xref:sop_retrieve_ocp4_cacert.adoc[SOP Retrieve OCP4 CACERT]
|
| |
+ - xref:sop_configure_image_registry_operator.adoc[SOP Configure the Image Registry Operator]
|
| |
+ - xref:sop_disable_provisioners_role.adoc[SOP Disable the Provisioners Role]
|
| |
+ - xref:sop_configure_oauth_ipa.adoc[SOP Configure oauth Authentication via IPA/Noggin]
|
| |
+ - xref:sop_configure_local_storage_operator.adoc[SOP Configure the Local Storage Operator]
|
| |
+ - xref:sop_configure_openshift_container_storage.adoc[SOP Configure the Openshift Container Storage Operator]
|
| |
+ - xref:sop_configure_userworkload_monitoring_stack.adoc[SOP Configure the Userworkload Monitoring Stack]
|
| |
+
|
| |
Signed-off-by: David Kirwan dkirwan@redhat.com
Signed-off-by: Akashdeep Dhar akashdeep.dhar@gmail.com