ansible-openqa-cloud

Created 3 months ago
Maintained by dbrouwer
Build and run a containerized deployment of openQA designed to use cloud resources.
Members 1
Deborah Brouwer committed a month ago

About

These playbooks reproduce Fedora's OpenQA deployment with minimal dependencies so that OpenQA can be run easily on a local machine or with cloud resources. General info about Fedora's OpenQA is here: https://fedoraproject.org/wiki/OpenQA

There are two main components: a server and workers to run the tests. The playbook openqa-cloud-server.yml will start the server and its related components in containers. The playbook openqa-cloud-worker-container.yml will start workers in containers.

Nested Virtualization Prerequisite:

OpenQA workers need to use kernel-based virtualization, so check that kvm modules and permissions are in place to use /dev/kvm. If running the openQA deployment in a virtualized environment, make sure that nested virtualization is available. For AWS we used metal instances. If you run the openQA deployment itself in QEMU, be sure to enable kvm e.g.

qemu-system-x86_64 -m 25G -enable-kvm  -cpu host -nographic -serial mon:stdio -drive file=Fedora-Server-KVM-40-1.14.x86_64.qcow2,format=qcow2 -netdev user,id=net0,hostfwd=tcp::8080-:80 -device e1000,netdev=net0

1. Example: a Local Deployment

Start the openQA server containers using the Demo account, "Fake" authentication, and self-signed certificates:

ansible-playbook \
-e web_ip_private=$(hostname -I | awk '{print $1}') \
-e auth=Fake \
-e use_http=true \
--ask-become-pass \
playbooks/hosts/openqa-cloud-server.yml

The first time it runs, it will take a while to build the podman images. Once the ansible playbook is finished, the containers will probably still be initializing; check on the containers with podman ps -a and journalctl -f. Keep checking localhost in your browser or just curl localhost. Make sure your browser will let you load a web page with self-signed certificates.

Once localhost is displaying the openQA user interface, there will be several new systemd services running on the host machine:
- openqa-dispatcher: runs fedora_openqa to listen for new Fedora images and updates to be tested;
- openqa-database: holds user information and test jobs;
- openqa-webserver: runs the web user interface, test scheduler, websockets and live handler for talking with the workers;
- openqa-reverse-proxy: handles outside requests and ssl/tls certificates;
- openqa-test-update: a small service to regularly pull test and needle updates from os-autoinst-distri-fedora.

Once the server is up, click the login button on the user interface. Even though this example is using 'Fake' authentication, the test templates will be loaded using the Demo user's api key/secret. If the Demo user isn't logged in, the test templates won't load. Once the Demo user is logged in, load the test templates with this command:

podman exec -it openqa-webserver /bin/bash -c "cd /var/lib/openqa/share/tests/fedora/;
./fifloader.py --load  templates.fif.json templates-updates.fif.json;"

The successfully loaded templates will appear in the Job Groups drop-down menu.

After the test templates are loaded, schedule a build for testing. If you wait long enough, the openqa-dispatcher will pick up fedora messages with an image to test. Alternatively, to schedule jobs manually, get a BUILDURL from the "Settings" tab of any test by clicking through the test's coloured dot: https://openqa.fedoraproject.org. Fedora Cloud images are the simplest to test with openQA because they don't require any local images or advanced networking. For example:

podman exec -it openqa-dispatcher /bin/bash -c "source /venv/bin/activate;
fedora-openqa compose -f https://kojipkgs.fedoraproject.org/compose/cloud/Fedora-Cloud-40-20240619.0/compose"

A successfully scheduled build will appear on the localhost homepage.

Another way to see the scheduled jobs is to query the openqa database directly. For example:

podman exec -it openqa-database /bin/bash -c "psql -U postgres -d openqa";
openqa=# \dt;
openqa=# \d jobs;
openqa=# SELECT build FROM jobs;

Yet another way to see the scheduled jobs is to use openQA's REST API. For example:

podman exec -it openqa-webserver /bin/bash -c "openqa-cli api --host http://$(hostname -I | awk '{print $1}')  --json jobs | jq -r '.jobs[]'"

Next, start some workers locally to run the tests. This example will start 16 worker containers along with a corresponding systemd service for each worker container. Make sure that the Demo user is still logged in because the workers will also use the Demo user's api key/secret. If the Demo user isn't logged in, the workers will shut themselves down.

ansible-playbook \
-e worker_ip_private=$(hostname -I | awk '{print $1}')  \
-e use_http=true \
--ask-become-pass \
playbooks/hosts/openqa-cloud-worker-container.yml

See a list of all the containerized workers:

sudo systemctl list-units --all 'openqa-worker@*.service';

In addition to the workers, there will be a new systemd service running on the host machine:
- openqa-createhdds: a service that runs createhdds to build images that can't be fetched remotely.

If you have limited disk space or can't wait several hours for the openqa-createhdds service to finish generating local images, then just stop and disable the openqa-createhdds service: sudo systemctl stop open-createhdds && sudo systemctl disable openqa-createhdds

Click through the web user interface to monitor the test progress. Initially, some of the tests will fail while waiting for the openqa-createhdds service to finish generating local images. This process can take several hours. Also, tests that require tap devices, won't run in containers; use the playbook openqa-cloud-worker.yml instead for tap tests.

Stop the web server and components manually. Make sure to stop the database last or it won't exit gracefully.

for service in openqa-test-update.timer openqa-test-update openqa-createhdds.timer openqa-createhdds openqa-reverse-proxy openqa-dispatcher openqa-webserver openqa-database; do
    sudo systemctl stop $service && sudo systemctl disable $service;
done

To stop containerized workers:

sudo systemctl stop openqa-worker@{0..15} && sudo systemctl disable openqa-worker@{0..15}

2. Example: a Production Deployment

Run the web server containers with a public ip using OpenID authentication and SSL/TLS certificates:

ansible-playbook \
-e web_ip_private=$(hostname -I | awk '{print $1}') \
-e web_ip_public=openqa.fedorainfracloud.org \
-e apikey=<> \
-e apisecret=<> \
--ask-become-pass \
playbooks/hosts/openqa-cloud-server.yml

Run workers in containers with a public web server.

ansible-playbook \
-e worker_ip_private=$(hostname -I | awk '{print $1}')  \
-e web_ip_public=openqa.fedorainfracloud.org \
-e worker_ip_public=$(curl ipinfo.io/ip) \
-e apikey=<> \
-e apisecret=<> \
--ask-become-pass \
playbooks/hosts/openqa-cloud-worker-container.yml

Run tap workers without containers. Workers of any class can be run without containers, but tap workers must be run outside of containers so that the QEMU machines can make calls to the D-Bus to dynamically add/remove tags to tap devices to allow for VLAN segregation. Run all workers belonging to the same tap class on the same machine because the virtual LANs created using OpenvSwitch will not work across different machines.

This command will start 32 workers of class 'qemu_x86_64,tap'.

ansible-playbook \
-e worker_ip_private=$(hostname -I | awk '{print $1}')  \
-e web_ip_public=openqa.fedorainfracloud.org \
-e worker_ip_public=$(curl ipinfo.io/ip) \
-e apikey=<> \
-e apisecret=<> \
-e eth=$(ls -1 /sys/class/net | grep -E '^(en|eth)[0-9]*') \
-e worker_class=qemu_x86_64,tap \
-e number_of_workers=31 \
--ask-become-pass \
playbooks/hosts/openqa-cloud-worker.yml

On another machine start 32 workers of class 'qemu_x86_64,tap2'.

ansible-playbook \
-e worker_ip_private=$(hostname -I | awk '{print $1}')  \
-e web_ip_public=openqa.fedorainfracloud.org \
-e worker_ip_public=$(curl ipinfo.io/ip) \
-e apikey=<> \
-e apisecret=<> \
-e eth=$(ls -1 /sys/class/net | grep -E '^(en|eth)[0-9]*') \
-e worker_class=qemu_x86_64,tap2 \
-e number_of_workers=31 \
--ask-become-pass \
playbooks/hosts/openqa-cloud-worker.yml

List the workers running outside of containers:

sudo systemctl list-units --all 'openqa-worker-plain@*.service';

Stop the workers running outside of containers:

sudo systemctl stop openqa-worker-plain@{0..31}

Optional variables with their defaults:

Environment Variable Default Options
-e web_ip_public web_ip_private The ip address to access the web UI. If testing locally, leave this blank and it will just use the local ip. If not testing locally, provide a public ip or domain name e.g. openqa.fedorainfracloud.org. Warning: DO NOT use 127.0.0.1 or localhost for ANY ip address since this confuses the containers.
-e auth OpenID Set auth=Fake to test with Demo user
-e use_http false Set use_http=true if the public ip has missing or self-signed certificates, so that tests can be scheduled and workers can run.
-e apikey 1234567890ABCDEF
-e apisecret 1234567890ABCDEF
-e dir $HOME Specify a directory for files shared with containers.
-e eth eth0 Ethernet interface name needed for iptables routing of tap workers.
-e worker_class qemu_x86_64 Specify a worker class e.g. 'qemu_x86_64,tap' or 'qemu_x86_64,tap2'.