#7817 RabbitMQ is down in staging
Closed: Fixed 9 months ago by kevin. Opened 9 months ago by bowlofeggs.

I'm trying to run Bodhi's openshift playbook, and it fails with this error:

TASK [rabbit/user : Create the user in RabbitMQ] ********************************************************
Tuesday 21 May 2019  14:39:37 +0000 (0:00:00.064)       0:00:00.164 *********** 
fatal: [os-master01.stg.phx2.fedoraproject.org]: FAILED! => {"changed": false, "cmd": "/usr/sbin/rabbitmqctl -q -n rabbit list_users", "msg": "Error:********@rabbitmq01.stg.phx2.fedoraproject.org'\n- home dir: /var/lib/rabbitmq\n- cookie hash: FxutaR0KvbmZYeAyJEBcKA==", "rc": 69, "stderr": "Error: unable to connect to node 'rabbit@rabbitmq01.stg.phx2.fedoraproject.org': nodedown\n\nDIAGNOSTICS\n===========\n\nattempted to contact: ['rabbit@rabbitmq01.stg.phx2.fedoraproject.org']\n\nrabbit@rabbitmq01.stg.phx2.fedoraproject.org:\n  * connected to epmd (port 4369) on rabbitmq01.stg.phx2.fedoraproject.org\n  * epmd reports: node 'rabbit' not running at all\n                  no other nodes on rabbitmq01.stg.phx2.fedoraproject.org\n  * suggestion: start the node\n\ncurrent node details:\n- node name: 'rabbitmq-cli-36@rabbitmq01.stg.phx2.fedoraproject.org'\n- home dir: /var/lib/rabbitmq\n- cookie hash: FxutaR0KvbmZYeAyJEBcKA==\n\n", "stderr_lines": ["Error: unable to connect to node 'rabbit@rabbitmq01.stg.phx2.fedoraproject.org': nodedown", "", "DIAGNOSTICS", "===========", "", "attempted to contact: ['rabbit@rabbitmq01.stg.phx2.fedoraproject.org']", "", "rabbit@rabbitmq01.stg.phx2.fedoraproject.org:", "  * connected to epmd (port 4369) on rabbitmq01.stg.phx2.fedoraproject.org", "  * epmd reports: node 'rabbit' not running at all", "                  no other nodes on rabbitmq01.stg.phx2.fedoraproject.org", "  * suggestion: start the node", "", "current node details:", "- node name: 'rabbitmq-cli-36@rabbitmq01.stg.phx2.fedoraproject.org'", "- home dir: /var/lib/rabbitmq", "- cookie hash: FxutaR0KvbmZYeAyJEBcKA==", ""], "stdout": "", "stdout_lines": []}

It is using the wrong host: rabbit@rabbitmq01.stg.phx2.fedoraproject.org, it should be using rabbitmq.stg.fedoraproject.org

Other apps have been adjusted: https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=1a09cff25c3afbabf9b4d2b7856827b5f334c1b1

This is the rabbitmq role that's failing; Bodhi's config file is not involved.

Should be fixed. It looks like it might be an ordering issue in our reboots... 01 came up and didn't find the others? will have to investigate...

[root@rabbitmq01 log][STG]# rabbitmqctl list_queues --online
Listing queues
test 0

should there be more queues?

You want probibly:

rabbitmqctl list_queues -p /pubsub --online

Listing queues
amqp_to_zmq 0
faf 0
the-new-hotness.stg 0
bodhi.stg_composer 0
bodhi.stg 2
federation: zmq.topic -> rabbit@rabbitmq01.stg.phx2.fedoraproject.org:/public_pubsub:zmq.topic 0
greenwave.stg 155175
amqp_bridge_verify_missing 0
federation: amq.topic -> rabbit@rabbitmq01.stg.phx2.fedoraproject.org:/public_pubsub:amq.topic 0

Yeah seems to be back now - thanks!

I landed a fix in our reboot playbook to restart these after reboots are done and the hosts are back.

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

9 months ago

Login to comment on this ticket.