| |
@@ -0,0 +1,164 @@
|
| |
+ = cloud-image-uploader SOP
|
| |
+
|
| |
+ Upload Cloud images to public clouds after they are built in Koji.
|
| |
+
|
| |
+ Source code: https://pagure.io/cloud-image-uploader
|
| |
+
|
| |
+ == Contact Information
|
| |
+
|
| |
+ Owner::
|
| |
+ Cloud SIG, Jeremy Cline (jcline)
|
| |
+ Contact::
|
| |
+ #cloud:fedoraproject.org (Matrix)
|
| |
+ Servers::
|
| |
+ - https://console-openshift-console.apps.ocp.stg.fedoraproject.org/project-details/ns/cloud-image-uploader[Stage]
|
| |
+ - https://console-openshift-console.apps.ocp.fedoraproject.org/project-details/ns/cloud-image-uploader[Production]
|
| |
+
|
| |
+ Purpose::
|
| |
+ Upload Cloud images to public clouds.
|
| |
+
|
| |
+ == Description
|
| |
+
|
| |
+ cloud-image-uploader is an AMQP message consumer (run via `fedora-messaging
|
| |
+ consume`) that processes Pungi compose messages published on the
|
| |
+ `org.fedoraproject.*.pungi.compose.status.change` AMQP topic. When a compose
|
| |
+ enters the `FINISHED` or `FINISHED_INCOMPLETE` states, the service downloads
|
| |
+ any images in the compose and uploads it to the relevant cloud provider by
|
| |
+ running an Ansible playbook. Consult the `playbooks` directory in the source
|
| |
+ repository or Python package to see the playbooks.
|
| |
+
|
| |
+ The service does not accept any incoming connections and only depends on the
|
| |
+ RabbitMQ message broker and the relevant cloud provider's APIs.
|
| |
+
|
| |
+ It requires a few gigabytes of temporary space to download the images before
|
| |
+ uploading them to the cloud provider. It is heavily I/O bound and the most
|
| |
+ computationally expensive thing it does is decompress the images.
|
| |
+
|
| |
+ == General Configuration
|
| |
+
|
| |
+ The Fedora Ansible repository contains the
|
| |
+ https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/cloud-image-uploader[OpenShift
|
| |
+ application definition]. The playbook to create the OpenShift application is
|
| |
+ located at `playbooks/openshift-apps/cloud-image-uploader.yml`.
|
| |
+
|
| |
+ Within the container image, configuration is provided via
|
| |
+ `/etc/fedora-messaging/config.toml`. Additionally, secrets may be provided via
|
| |
+ environment variables and are noted in the relevant cloud sections.
|
| |
+
|
| |
+ == Deploying
|
| |
+
|
| |
+ The service contains a single image and one pod in its deployment configuration.
|
| |
+
|
| |
+ === Staging
|
| |
+
|
| |
+ The staging BuildConfig builds a container from
|
| |
+ https://pagure.io/cloud-image-uploader/tree/main[the main branch]. You need to
|
| |
+ trigger a build manually, either from the web UI or the CLI.
|
| |
+
|
| |
+ === Production
|
| |
+
|
| |
+ The staging BuildConfig builds a container from
|
| |
+ https://pagure.io/cloud-image-uploader/tree/prod[the prod branch]. Just like
|
| |
+ staging, you need to trigger a build manually. After deploying to staging, the
|
| |
+ main branch can be merged into the production branch to "promote" it:
|
| |
+
|
| |
+ ....
|
| |
+ $ git checkout prod && git merge --ff-only main
|
| |
+ ....
|
| |
+
|
| |
+ === Azure
|
| |
+
|
| |
+ Images are uploaded whenever a compose that contains `vhd-compressed` images.
|
| |
+ Images are first uploaded to a container in the storage account and then
|
| |
+ imported into an Image Gallery.
|
| |
+
|
| |
+ Credentials for Azure are provided using environment variables. The credentials
|
| |
+ are used by the
|
| |
+ https://docs.ansible.com/ansible/latest/collections/azure/azcollection/index.html[Azure
|
| |
+ Ansible collection].
|
| |
+
|
| |
+ ==== Image Cleanup
|
| |
+
|
| |
+ Image clean-up is automated.
|
| |
+
|
| |
+ The storage account is configured to delete any blob in the container older
|
| |
+ than 1 week and should require no manual attention. Nothing in the container is
|
| |
+ required after the VHD is imported to the Image Gallery.
|
| |
+
|
| |
+ Images in the Gallery are cleaned up by the image uploader after a new image
|
| |
+ has been uploaded. For complete details on the image cleanup policy refer to
|
| |
+ the consumer code, but at the time of this writing the policy is as follows:
|
| |
+
|
| |
+ - Any image that has an end-of-life field that is in the past is removed.
|
| |
+
|
| |
+ - Only the latest 7 images that are marked as "excluded from latest = True"
|
| |
+ within an image definition are retained. When an image is marked as "exclude
|
| |
+ from latest = False", new virtual machines that don't reference an explicit
|
| |
+ image version will boot using the newest image (following semver). All images
|
| |
+ are uploaded with "excluded from latest = True" and are only marked as
|
| |
+ "excluded from latest = False" after testing.
|
| |
+
|
| |
+ - Only the latest 7 images in the Rawhide image definitions are retained,
|
| |
+ regardless of whether they are marked "excluded from latest = False".
|
| |
+
|
| |
+ At the moment, testing and promotion to "excluded from latest = False" is a
|
| |
+ manual process, but in the future will be automated to happen regularly
|
| |
+ (weekly, perhaps).
|
| |
+
|
| |
+ ==== Authentication
|
| |
+
|
| |
+ The following environment variables are used:
|
| |
+
|
| |
+ ....
|
| |
+ AZURE_SUBSCRIPTION_ID - Identifies the subscription within an Azure tenant (our tenant only has 1)
|
| |
+ AZURE_CLIENT_ID - The application ID used during authentication.
|
| |
+ AZURE_SECRET - The application secret used during authentication.
|
| |
+ AZURE_TENANT - Identifies the Azure tenant.
|
| |
+ ....
|
| |
+
|
| |
+ If you have access to the Fedora Project tenant, these values are available in
|
| |
+ the https://portal.azure.com[web portal] under the Microsoft Entra ID service
|
| |
+ in the "App registrations" tab. To manage things via the CLI you can do `dnf
|
| |
+ install azure-cli`. All commands below assume you've logged in with `az login`.
|
| |
+
|
| |
+ There are two app registrations, `fedora-cloud-image-uploader` and
|
| |
+ `fedora-cloud-image-uploader-staging`. These were created by running:
|
| |
+ ....
|
| |
+ $ az ad app create --display-name fedora-cloud-image-uploader
|
| |
+ ....
|
| |
+
|
| |
+ ==== Authorization
|
| |
+
|
| |
+ Images are placed in two resource groups (containers for arbitrary resources).
|
| |
+ `fedora-cloud-staging` is used for the staging deployment, and `fedora-cloud`
|
| |
+ is used for the production deployment.
|
| |
+
|
| |
+ The app registrations are granted access to their respective resource group by
|
| |
+ assigning them a role on the resource group. The role definition can be seen with:
|
| |
+
|
| |
+ ....
|
| |
+ $ az role definition list --name "Image Uploader"
|
| |
+ ....
|
| |
+
|
| |
+ This role is then assigned to the app registration with
|
| |
+
|
| |
+ ....
|
| |
+ $ az role assignment create --assignee "fedora-cloud-image-uploader" \
|
| |
+ --role "Image Uploader" \
|
| |
+ --scope "/subscriptions/{subscription_id}/resourceGroups/fedora-cloud"
|
| |
+ ....
|
| |
+
|
| |
+ In the event that additional permissions are required, the role can be updated
|
| |
+ with additional permission.
|
| |
+
|
| |
+
|
| |
+ ==== Credential rotation
|
| |
+
|
| |
+ At the moment, credentials are set to expire and will need to be periodically rotated. To do so via the CLI:
|
| |
+ ....
|
| |
+ $ az ad app list -o table # Find the application to issue new secrets for and set CLIENT_ID to its "Id" field
|
| |
+ $ touch azure_secret
|
| |
+ $ chmod 600 azure_secret
|
| |
+ $ SECRET_NAME="Some useful name for the secret"
|
| |
+ $ az ad app credential reset --id $CLIENT_ID --append --display-name $SECRET_NAME --years 1 --query password --output tsv > azure_secret
|
| |
+ ....
|
| |
This is just the basics for now. In addition to adding AWS and GCP
sections (once the image uploader supports those clouds), I plan on
adding details on common tasks (deploy a new version, deal with
failures, etc).