#6918 Reinstall Koschei on Fedora 28
Closed: Fixed 2 years ago Opened 2 years ago by mizdebsk.

With Koji builders on Fedora 28, Koschei (especially backend) should be reinstalled on Fedora 28 too.
Currently this is blocked on staging testing (#6805 - staging Koschei DB needs to be wiped).

Metadata Update from @mizdebsk:
- Issue priority set to: Waiting on Assignee (was: Waiting on External)

2 years ago

A PoC work long on this issue, for interested apprentices. Let me know if this is helpful or not.

Before touching production, I started with testing the reinstall in staging deployment.
First I repointed kickstarts to Fedora 28: https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=2c9a5d0
Then I compared koschei versions in f28-updates and f27-infra:
$ koji latest-pkg f28-updates koschei
$ koji latest-pkg f27-infra koschei
If version in f28-updates was lower than in f27-infra I would need to update tag listings. But since the versions were the same, there was nothing to tag in Koji.
So I proceeded with destroying staging virt instances:
[root@batcave01 ~][PROD]# ansible-playbook /srv/web/infra/ansible/playbooks/destroy_virt_inst.yml -e target=koschei-backend-stg,koschei-web-stg
And re-installed them:
[root@batcave01 ~][PROD]# ansible-playbook /srv/web/infra/ansible/playbooks/groups/koschei-backend.yml -l staging
[root@batcave01 ~][PROD]# ansible-playbook /srv/web/infra/ansible/playbooks/groups/koschei-web.yml -l staging
After reinstall was complete (surprisingly, on first attempt!), I ran staging sync:
[root@batcave01 ~][PROD]# ansible-playbook /srv/web/infra/ansible/playbooks/manual/staging-sync/koschei.yml
It failed to drop database due to active sessions. So I forcibly disconnected them and re-ran the sync playbook.
[root@db01 ~][STG]# sudo -u postgres psql
postgres=# SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = 'koschei' AND pid <> pg_backend_pid();
Then I went to https://apps.stg.fedoraproject.org/koschei/ to check if the frontend is working. It was not - It was showing outage placeholder page.
I checked haproxy status page at https://admin.stg.fedoraproject.org/haproxy/proxy01#koschei-backend - it showed "Layer7 wrong status: INTERNAL SERVER ERROR"
So I checked httpd logs:
[root@koschei-web01 ~][STG]# tail -f /var/log/httpd/*
In access_log there was a stream of 500 errors sent to (proxy01.stg), but error_log was silent.
Suspecting SELinux problems, I checked audit.log, but there was nothing relenvant.
So I checked HTML output of flask app:
[root@koschei-web01 ~][STG]# curl
and it showed <p>No collections setup</p>
I remembered that the sync playbook used to create collection, but I didn't find it in the playbook. So I checked when and why it was removed:
(ansible)$ git bisect start HEAD f0681006
(ansible)$ git bisect run grep -q create-collection playbooks/manual/staging-sync/koschei.yml
I examined commit ff176d994 pointed by bisect run and re-added removed code to set up initial collection during sync, https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=746b086
After re-running sync playbook Koschei frontend showed up in web browser.

Production will be done in a few days, after staging testing is complete and users are notified about upcoming outage.

Done - all Koschei machines in staging and production are running on Fedora 28.

Metadata Update from @mizdebsk:
- Issue close_status updated to: Fixed
- Issue priority set to: None (was: Waiting on Assignee)
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.