Looking at builds: https://fedora.softwarefactory-project.io/zuul/builds it seems the last builds were triggered 2024-10-01 21:25:51 (Python 3.12) and then nothing. I wanted to run Zuul again on https://src.fedoraproject.org/rpms/python3.13/pull-request/114 now and nothing happens: the job is not added to the queue even after ~10 minutes.
Hi,
The issue was that the fm-gateway (which is the component connecting to fedmsg and relaying messages to Zuul) was unable to "connect" on the message bus.
Unhandled error in Deferred: Traceback (most recent call last): File "/usr/lib64/python3.9/site-packages/twisted/internet/defer.py", line 1475, in gotResult _inlineCallbacks(r, g, status) File "/usr/lib64/python3.9/site-packages/twisted/internet/defer.py", line 1464, in _inlineCallbacks status.deferred.errback() File "/usr/lib64/python3.9/site-packages/twisted/internet/defer.py", line 501, in errback self._startRunCallbacks(fail) File "/usr/lib64/python3.9/site-packages/twisted/internet/defer.py", line 568, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File "/usr/lib64/python3.9/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/lib/python3.9/site-packages/fedora_messaging/twisted/factory.py", line 329, in on_ready_connection_errback r = failure.trap( File "/usr/lib64/python3.9/site-packages/twisted/python/failure.py", line 460, in trap self.raiseException() File "/usr/lib64/python3.9/site-packages/twisted/python/failure.py", line 488, in raiseException raise self.value.with_traceback(self.tb) File "/usr/lib64/python3.9/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks result = result.throwExceptionIntoGenerator(g) File "/usr/lib64/python3.9/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File "/usr/lib/python3.9/site-packages/fedora_messaging/twisted/factory.py", line 323, in on_ready yield client.declare_queues([queue]) File "/usr/lib64/python3.9/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks result = result.throwExceptionIntoGenerator(g) File "/usr/lib64/python3.9/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File "/usr/lib/python3.9/site-packages/fedora_messaging/twisted/protocol.py", line 553, in declare_queues raise BadDeclaration("queue", args, e) fedora_messaging.exceptions.BadDeclaration: Unable to declare the queue object ({'queue': '8ccf2c4f-fc20-4785-af05-e695ad8665df', 'durable': True, 'auto_delete': False, 'exclusive': False, 'arguments': {}, 'passiv e': False}) because (404, "NOT_FOUND - home node 'rabbit@rabbitmq02.iad2.fedoraproject.org' of durable queue '8ccf2c4f-fc20-4785-af05-e695ad8665df' in vhost '/public_pubsub' is down or inaccessible")
I've restarted the service and now messages are forwarded as expected and jobs are triggered.
It is not the first time that issue happen after the service was running for some weeks, then the events loop seems to be stuck. Any ideas, or improvements to avoid this issue would be welcome https://pagure.io/fm-gateway/tree/master :)
Perhaps the first step here is to prepare a new container with up to date dependencies.
Thanks for the report.
Metadata Update from @fbo: - Issue status updated to: Closed (was: Open)
Thank you!
Log in to comment on this ticket.