Various fedora services are not available right now. I found out the following are either timeout or report back service not available: https://src.fedoraproject.org/ https://kojipkgs.fedoraproject.org/ https://apps.fedoraproject.org/packages/
Another service: https://koji.fedoraproject.org/
It looks like some of the services are up again and those running in openshift are still unavailable.
It looks like the whole openshift cluster was reset and it's getting back up.
According to discussion with @pingou in #fedora-admin this looks like networking issue in IAD2
The issue is still ongoing, another services identified as not working: https://release-monitoring.org/ dl.fedoraproject.org
And Eeverything running in our openshift is unable to deploy a new pod right now. Here is the list of projects hosted in our OpenShift, some of them could still run: asknot bodhi compose-tracker coreos-cincinnati coreos-koji-tagger coreos-ostree-importer distgit-bugzilla-sync docsbuilding elections fas fedora-ostree-pruner greenwave ipsilon koschei kube-public kube-service-catalog kube-system management-infra mdapi message-tagging-service messaging-bridges monitor-gating release-monitoring review-stats silverblue the-new-hotness transtats waiverdb websites
COPR builds does not work too.
Yes, copr issues reported here: https://lists.fedoraproject.org/archives/list/copr-devel@lists.fedorahosted.org/thread/45GSVWNLZ2P4LJ4TMCIZLERMYWGISZXK/
Metadata Update from @smooge: - Issue assigned to smooge
Metadata Update from @smooge: - Issue priority set to: None (was: Needs Review) - Issue tagged with: high-gain, high-trouble
All services are down. Routers, firewalls and switches in IAD2 are in a critical state. Work is being done on them but there is no ETA or known cause at this time.
Wow, what happened there? Looks like some deluge or earthquake.
We appear to be back. The issue seems to be around a failed switch or switching (still being investigated).
Please report anything you see still down and we will work to make sure it's back up.
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
I still see following error using fedpkg new-sources: Could not execute new_sources: Error occurs inside the server.
fedpkg new-sources
Could not execute new_sources: Error occurs inside the server.
@adrian should be fixed now.
Is mbs.fedoraproject.org okay? Builds stuck in "init" or should I just wait longer?
I still see issues in OpenShift cluster, the pod deployment fails with Error: ImagePullBackOff.
Error: ImagePullBackOff
Ah, I think I just needed to wait longer, sorry for the noise
@zlopez so, turns out our storage for our openshift registry had the wrong perms and it couldn't write to it. ;( So, it came back, but there were 0 images there.
I have fixed the perms and started new builds of everything that was waiting in imagepullbackoff.
Should be back to normal in a bit and also have the images actually stored.
It looks like the issue is gone now.
@kevin Do we know the root cause of this issue? Can we do anything to prepare this in the future?
Metadata Update from @zlopez: - Issue status updated to: Open (was: Closed)
Metadata Update from @zlopez: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
We don't have exact details... but the cause was a fault switch. It sent corrupted information to other switches and caused a cascading failure.
The fault switch was turned off on friday and replaced on saturday. Hopefully this was just a rare one off issue...
Login to comment on this ticket.