With Koji builders on Fedora 28, Koschei (especially backend) should be reinstalled on Fedora 28 too. Currently this is blocked on staging testing (#6805 - staging Koschei DB needs to be wiped).
Metadata Update from @mizdebsk: - Issue priority set to: Waiting on Assignee (was: Waiting on External)
A PoC work long on this issue, for interested apprentices. Let me know if this is helpful or not.
Before touching production, I started with testing the reinstall in staging deployment. First I repointed kickstarts to Fedora 28: https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=2c9a5d0 Then I compared koschei versions in f28-updates and f27-infra: $ koji latest-pkg f28-updates koschei $ koji latest-pkg f27-infra koschei If version in f28-updates was lower than in f27-infra I would need to update tag listings. But since the versions were the same, there was nothing to tag in Koji. So I proceeded with destroying staging virt instances: [root@batcave01 ~][PROD]# ansible-playbook /srv/web/infra/ansible/playbooks/destroy_virt_inst.yml -e target=koschei-backend-stg,koschei-web-stg And re-installed them: [root@batcave01 ~][PROD]# ansible-playbook /srv/web/infra/ansible/playbooks/groups/koschei-backend.yml -l staging [root@batcave01 ~][PROD]# ansible-playbook /srv/web/infra/ansible/playbooks/groups/koschei-web.yml -l staging After reinstall was complete (surprisingly, on first attempt!), I ran staging sync: [root@batcave01 ~][PROD]# ansible-playbook /srv/web/infra/ansible/playbooks/manual/staging-sync/koschei.yml It failed to drop database due to active sessions. So I forcibly disconnected them and re-ran the sync playbook. [root@db01 ~][STG]# sudo -u postgres psql postgres=# SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = 'koschei' AND pid <> pg_backend_pid(); Then I went to https://apps.stg.fedoraproject.org/koschei/ to check if the frontend is working. It was not - It was showing outage placeholder page. I checked haproxy status page at https://admin.stg.fedoraproject.org/haproxy/proxy01#koschei-backend - it showed "Layer7 wrong status: INTERNAL SERVER ERROR" So I checked httpd logs: [root@koschei-web01 ~][STG]# tail -f /var/log/httpd/* In access_log there was a stream of 500 errors sent to 10.5.128.177 (proxy01.stg), but error_log was silent. Suspecting SELinux problems, I checked audit.log, but there was nothing relenvant. So I checked HTML output of flask app: [root@koschei-web01 ~][STG]# curl http://127.0.0.1/koschei/ and it showed <p>No collections setup</p> I remembered that the sync playbook used to create collection, but I didn't find it in the playbook. So I checked when and why it was removed: (ansible)$ git bisect start HEAD f0681006 (ansible)$ git bisect run grep -q create-collection playbooks/manual/staging-sync/koschei.yml I examined commit ff176d994 pointed by bisect run and re-added removed code to set up initial collection during sync, https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=746b086 After re-running sync playbook Koschei frontend showed up in web browser.
$ koji latest-pkg f28-updates koschei
$ koji latest-pkg f27-infra koschei
[root@batcave01 ~][PROD]# ansible-playbook /srv/web/infra/ansible/playbooks/destroy_virt_inst.yml -e target=koschei-backend-stg,koschei-web-stg
[root@batcave01 ~][PROD]# ansible-playbook /srv/web/infra/ansible/playbooks/groups/koschei-backend.yml -l staging
[root@batcave01 ~][PROD]# ansible-playbook /srv/web/infra/ansible/playbooks/groups/koschei-web.yml -l staging
[root@batcave01 ~][PROD]# ansible-playbook /srv/web/infra/ansible/playbooks/manual/staging-sync/koschei.yml
[root@db01 ~][STG]# sudo -u postgres psql
postgres=# SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = 'koschei' AND pid <> pg_backend_pid();
[root@koschei-web01 ~][STG]# tail -f /var/log/httpd/*
[root@koschei-web01 ~][STG]# curl http://127.0.0.1/koschei/
<p>No collections setup</p>
(ansible)$ git bisect start HEAD f0681006
(ansible)$ git bisect run grep -q create-collection playbooks/manual/staging-sync/koschei.yml
Production will be done in a few days, after staging testing is complete and users are notified about upcoming outage.
Done - all Koschei machines in staging and production are running on Fedora 28.
Metadata Update from @mizdebsk: - Issue close_status updated to: Fixed - Issue priority set to: None (was: Waiting on Assignee) - Issue status updated to: Closed (was: Open)
Login to comment on this ticket.