#10675 Planned Outage - Koschei maintenance - 2022-06-07 05:00 UTC
Closed: Fixed 2 years ago by mizdebsk. Opened 2 years ago by mizdebsk.

There will be an outage starting at 2022-06-07 05:00 UTC,
which will last approximately two hours.

To convert UTC to your local time, take a look at
http://fedoraproject.org/wiki/Infrastructure/UTCHowto
or run:

date -d '2022-06-07 05:00 UTC'

Reason for outage:

During the outage window the following maintenance steps will be performed on Koschei:
- underlying operating system will be updated to the latest stable version (Fedora 36),
- Fedora 34 collection will be deleted due to reaching its EOL,
- unapplied dependency changes will be wiped from database,
- PostgreSQL database will be vacuumed fully to optimize performance.

Affected Services:

During the outage window Koschei services may be unavailable.

Ticket Link:

https://pagure.io/fedora-infrastructure/issue/10675

Please join #fedora-admin or #fedora-noc on irc.libera.chat
or add comments to the ticket for this outage above.


Note: The ocp3 cluster where koschei is now can't run Fedora images newer than f34 due to glibc/docker doom. Also, we are wanting to retire that cluster. ;)

So, before this or as part of this perhaps, koschei should move to the ocp4 cluster (this should hopefully just consist of changing the playbook to run on os-control01{stg} instead of os-master[0]{stg}.

The outage will begin in ~30 minutes.
Koschei is running on Fedora 35 on OCP 3 just fine, I don't see any issues.
I've just come back from long vacation and I didn't have time to prepare for migration to OCP 4, therefore at this time Koschei will not be migrated. However I will plan the migration in the near future.
I want to take the opportunity to refactor deployment configs as well, to take advantage of new Kubernetes APIs.

The outage may take longer than anticipated 2 hours.
The outage window is approaching its end, but Fedora 34 collection is still being deleted - PostgreSQL query is running on db01, but it is competing for resources with a tahrir query that has been running for a week.

The outage is over. Koschei operation has been restored.
F34 collection was deleted, deletion took ~2 hours due to higher than average load on the db server.
Unapplied changes deleted. Vacuum completed.
Koschei was not yet updated to run on Fedora 36, as that will be done during migration to OCP 4 cluster.

Metadata Update from @mizdebsk:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata