Our team receives a lot of PodCrashLoop alerts recently about resultsdb-ci-listener, though we have not maintained this service before (we maintain resultsdb).
Logs:
[fedora_messaging.cli INFO] Starting consumer with resultsdb_listener.consumer:Consumer callback [fedora_messaging.twisted.service INFO] Authenticating with server using x509 (certfile: /etc/pki/rabbitmq/crt/resultsdb-ci-listener.crt, keyfile: /etc/pki/rabbitmq/key/resultsdb-ci-listener.key) [fedora_messaging.cli ERROR] Unable to declare the binding object on the AMQP broker. The broker responded with (403, "ACCESS_REFUSED - access to queue 'resultsdb_ci_listener' in vhost '/pubsub' refused for user 'resultsdb'"). Check permissions for your user. [fedora_messaging.twisted.protocol INFO] Waiting for 0 consumer(s) to finish processing before halting [fedora_messaging.twisted.protocol INFO] Finished canceling 0 consumers [fedora_messaging.twisted.protocol INFO] Disconnect requested, but AMQP connection already gone
Is this service still used/needed? AFAIK it submits test result messages to resultsdb: https://pagure.io/ci-resultsdb-listener
My first thought was to look if the certificate is valid, but found out that there isn't any resultsdb-ci-listener.crt in ansible-private.
resultsdb-ci-listener.crt
@kevin Do you know where the certificate is?
Metadata Update from @zlopez: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: Needs investigation, rabbitmq
It looks like it is using the same messaging crt/key as resultsdb: "rabbitmq/{{env}}/pki/issued/resultsdb{{env_suffix}}.crt"
Yes, it should be the same cert.
The username/rabbitmq perms might be... not right.
Yeah, so I fixed the permissions manually, but we need to see why it's not right in ansible. ;(
I ran:
` rabbitmqctl set_permissions resultsdb --vhost /pubsub "^$" "^(amq\.topic)|(resultsdb_ci_listener.*)$" " ^(zmq\.topic)|^(amq\.topic)|(resultsdb_ci_listener.*)$"
Thanks, the resultsdb-ci-listener service runs fine now.
Isn't there conflict with how the permissions are set for both resultsdb and resultsdb-ci-listener?
Not sure if the previously set permissions would be revoked if there is second command: rabbitmqctl set_permission resultsdb ....
rabbitmqctl set_permission resultsdb ...
Resultsdb requires mainly to publish on org.fedoraproject.*.resultsdb.result.new topic (and consumes from none), whereas resultsdb-ci-listener does not sent to any topic (I guess that is the "^$" in the rabbitmqctl set_permissions command) but requires to consume from org.centos.prod.ci.koji-build.test.complete (among others).
org.fedoraproject.*.resultsdb.result.new
"^$"
rabbitmqctl set_permissions
org.centos.prod.ci.koji-build.test.complete
Actually, I still see the problem in staging environment (we still receive alerts).
I fixed staging also this weekend.
We need to see why these perms aren't getting set right by ansible.
This is still problem in staging. The pod is currently in CrashLoopBackOff state.
CrashLoopBackOff
Not sure if what is the magic behind generating the key/crt for messaging, but maybe this helps: https://pagure.io/fedora-infra/ansible/pull-request/1493
Metadata Update from @kevin: - Issue assigned to kevin
I think this is solved now.
Please re-open if there's anything further to do.
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.